I have a list of words and phrases together with as score and a definition for each. I would like to present this as an interactive wordcloud where the text sizes are determined by the scores and the definitions appear as tooltips on hover. I would prefer to do this in Jupyter.
I know a number libraries that offer nice ways to generate wordclouds and/or tooltips. How I attach the tooltips to the words in the wordcloud?. The wordcloud needs to have a way of knowing what text you are hovering over and trigger the corresponding tooltip. I have not found a way to do that so far.
I am fairly agnostic regarding the linraries used to do this.
I mainly want the result to be fairly high-level and mostly declarative.
I have looked at Vega, bqplot and Andreas Mueller's wordcloud package.
Vega has both wordcloud and tooltip functionality and is designed to compose piplines nicely, but I am not sure how to connect them the right way. I would also prefer to write actual Python code rather than code using JSON though, but that is a minor concern.
Bqplot does tootips very nicely but does not have a wordcloud component.
The wordcloud package generates nice wordclouds but I do not know how to make them interactive.
I have done this using both ipyvega and brunel brunel is much simpler but I do not like its wordcloud layout.
Brunel
df = pd.DataFrame(data, columns=['word', 'size', 'text'])
%brunel cloud size(size) label(word) tooltip(text)
ipyvega
spec = {
"$schema": "https://vega.github.io/schema/vega/v3.json",
"name": "wordcloud",
"width": width,
"height": height,
"padding": 0,
"data" : [
{
'name' : 'table',
'values' : [{'word': word, 'text': text, 'size': size}
for word, text size in data]
}
],
"scales": [
{
"name": "color",
"type": "ordinal",
"range": ["#d5a928", "#652c90", "#939597"]
}
],
"marks": [
{
"type": "text",
"from": {"data": "table"},
"encode": {
"enter": {
"text": {"field": "word"},
"align": {"value": "center"},
"baseline": {"value": "alphabetic"},
"fill": {"scale": "color", "field": "word"},
"tooltip": {"field": "text", "type": "nominal"}
},
"update": {
"fillOpacity": {"value": 1}
},
},
"transform": [
{
"type": "wordcloud",
"size": [width, height],
"text": {"field": "text"},
"font": "Helvetica Neue, Arial",
"fontSize": {"field": "datum.size"},
}
]
}
],
}
Vega(spec)
Related
I'm trying to achieve something in LogicApp which I think should be quite easy to achieve, but I'm not managing it.
Say I have a variable from a previous step: 'https://sharepoint/sites/test-site/Documents/somereport.pdf'. From this string, I need to simply create two variables, the first one containing 'https://sharepoint/sites/test-site', the second one containing 'Documents/somereport.pdf'. Both to be used in subsequent steps.
To try and achieve this I try to use the following expressions:
join(slice(split(triggerBody()?['Title'], '/'), 0, 5), '/')
join(slice(split(triggerBody()?['Title'], '/'), 5), '/')
However, I get an error: 'The template language function 'slice' expects its first parameter to be of type string. The provided value is of type 'Array'..
This since the split results in an array. I've now learned that 'slice' is meant for strings, but is there any similar functionality for an array type? Or is there any other (simple) way to achieve this? This seems like it should be basic functionality but I cannot figure it out.
This can be achieved through few ways. If you are trying to use a functionality taking the result as array type then you can use something like below expression.
1. join(take(split(outputs('Compose')?['Title'][0], '/'), 5),'/')
2. join(take(skip(split(outputs('Compose')?['Title'][0], '/'),5), length(split(outputs('Compose')?['Title'][0], '/'))),'/')
However, As an alternative, below is another expression that satisfies your requirement if you are trying to use a functionality taking the result as string type.
1. slice(outputs('Compose')?['Title'][0],0,nthIndexOf(outputs('Compose')?['Title'][0],'/',5))
2. slice(outputs('Compose')?['Title'][0],add(nthIndexOf(outputs('Compose')?['Title'][0],'/',5),1),length(outputs('Compose')?['Title'][0]))
Below is my logic app
RESULTS:
Below is the code-view of my logic app
{
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"Compose": {
"inputs": {
"Title": [
"https://sharepoint/sites/test-site/Documents/somereport.pdf"
],
"age": 30,
"name": "John"
},
"runAfter": {},
"type": "Compose"
},
"array_type_-_first_part": {
"inputs": "#join(take(split(outputs('Compose')?['Title'][0], '/'), 5),'/')",
"runAfter": {
"string_type__second_part": [
"Succeeded"
]
},
"type": "Compose"
},
"array_type_-_second_part": {
"inputs": "#join(take(skip(split(outputs('Compose')?['Title'][0], '/'),5), length(split(outputs('Compose')?['Title'][0], '/'))),'/')",
"runAfter": {
"array_type_-_first_part": [
"Succeeded"
]
},
"type": "Compose"
},
"string_type_-_first_part": {
"inputs": "#slice(outputs('Compose')?['Title'][0],0,nthIndexOf(outputs('Compose')?['Title'][0],'/',5))",
"runAfter": {
"Compose": [
"Succeeded"
]
},
"type": "Compose"
},
"string_type__second_part": {
"inputs": "#slice(outputs('Compose')?['Title'][0],add(nthIndexOf(outputs('Compose')?['Title'][0],'/',5),1),length(outputs('Compose')?['Title'][0]))",
"runAfter": {
"string_type_-_first_part": [
"Succeeded"
]
},
"type": "Compose"
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"parameters": {},
"triggers": {
"manual": {
"inputs": {
"schema": {}
},
"kind": "Http",
"type": "Request"
}
}
},
"parameters": {}
}
Pulling data into Google Data Studio from a Google Sheet with dates stored in yyyy-mm-dd format. The dates look correct and calculate correctly with formulas and adjustments everywhere except in a Gantt chart using the Vega-Lite Community Visualization, which shows the date in a long-number format (e.g. 20210520), and is unable to display the data when using "type": "temporal" or using "timeUnit": "utcyearmonthdatehours".
I've ran various tests, including...
Changing the date format for the date columns to plain text, yyyyddmm, yymmdd, yyyy/mm/dd formats.
Replace the current date columns with new columns using the alternate formats in point 1 (above).
Changing the date field formats directly in Google Data Studio to the formats in point 1 (above).
Creating a secondary set of date columns in plain-text using an Arrayformula and Text() function to reformat the actual dates to plain-text.
So far, options 2 & 4 are the only way I've been able to get the gantt to render correctly, reading the data in date format. But option 2 renders the other charts in GDS as unusable, as the other charts cannot translate the plain-text to usable dates.
Option 4 does work, but isn't the ideal route, given the redundant data. I'd prefer to have just 1 column for the Start Date and another for the End Date, rather than 2 columns for both. Feels like I may be missing something obvious here. Is there a way to either properly format the dates in Google Sheets to work properly with both the GDS date fields and Vega-Lite, or is there a way to properly parse the date data in Vega-Lite without needing to use a second set of plain-text columns?
Report replicating the issue: Project Tracking (debug report)
Edit: below is the code for the Vega-lite visualizations using the date fields from Google Sheets, which Vega-lite is not interpreting as dates.
Without timeunit or temporal field type:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A bar chart with highlighting on hover and selecting on click. (Inspired by Tableau's interaction style.)",
"config": {
"background": null,
"view": {
"stroke": "transparent"
}
},
"layer": [
{
"layer": [
{
"params": [
{
"name": "grid",
"select": "interval",
"bind": "scales"
}
],
"mark": {
"type": "bar",
"cursor": "pointer",
"tooltip": true,
"point": true,
"cornerRadiusEnd": 5,
"opacity": 0.8
},
"encoding": {
"color": {
"field": "$dimension3",
"title": "$dimension3.name"
}
}
}
],
"encoding": {
"x": {
"field": "$dimension0",
"axis": {
"title": null,
"grid": true
}
},
"y": {
"field": "$dimension1",
"title": "$dimension1.name",
"type": "nominal",
"sort": "x",
"axis": {
"title": null,
"grid": true,
"tickBand": "extent"
}
},
"x2": {
"field": "$dimension2"
},
"yOffset": {
"field": "$dimension3"
}
}
}
]
}
With timeunit and field type temporal:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A bar chart with highlighting on hover and selecting on click. (Inspired by Tableau's interaction style.)",
"config": {
"background": null,
"view": {
"stroke": "transparent"
}
},
"layer": [
{
"layer": [
{
"params": [
{
"name": "grid",
"select": "interval",
"bind": "scales"
}
],
"mark": {
"type": "bar",
"cursor": "pointer",
"tooltip": true,
"point": true,
"cornerRadiusEnd": 5,
"opacity": 0.8
},
"encoding": {
"color": {
"field": "$dimension3",
"title": "$dimension3.name"
}
}
}
],
"encoding": {
"x": {
"field": "$dimension0",
"type": "temporal",
"timeUnit": "utcyearmonthdatehours",
"axis": {
"title": null,
"grid": true
}
},
"y": {
"field": "$dimension1",
"title": "$dimension1.name",
"type": "nominal",
"sort": "x",
"axis": {
"title": null,
"grid": true,
"tickBand": "extent"
}
},
"x2": {
"field": "$dimension2"
},
"yOffset": {
"field": "$dimension3"
}
}
}
]
}
The drawGeometry method will not apply a point style to a polygon. That makes sense but I have a requirement that a polygon feature should be represented with both a stroke+fill style suitable for the feature as well as a point style, IMO, less-suitable for the feature.
For example:
Using a combination of polygon and point feature styles:
[
{
"fill": {
"pattern": {
"orientation": "diagonal",
"color": "rgba(230,113,26,1)",
"spacing": 3,
"repitition": "repeat"
}
}
},
{
"circle": {
"fill": { "color": "blue" },
"opacity": 1,
"stroke": {
"color": "rgba(0,255,0,1)",
"width": 1
},
"radius": 20
}
},
{
"image": {
"anchor": [
16,
48
],
"imgSize": [
32,
48
],
"anchorXUnits": "pixels",
"anchorYUnits": "pixels",
"src": "http://openlayers.org/en/v3.17.1/examples/data/icon.png"
}
}
]
One solution I've come up with is to replace the drawGeometry to call both drawPolygon and drawPoint:
But it seems like support for rendering a polygon using a point styling should be supported some other way. Maybe in the drawPolygon implementation it should detect a point style and react accordingly?
Use the geometry option of a style object and then return the interior point with geometry.getInteriorPoint(), see: http://openlayers.org/en/latest/examples/polygon-styles.html?q=geometry+style for an example
I know that it is possible to include a hyperlink in an Apple News Format article using markdown by doing the following:
{
"version": "1.0",
"identifier": "sketchyTech_Demo",
"title": "My First Article",
"language": "en",
"layout": {},
"components": [
{
"role": "title",
"text": "My First Article",
"textStyle": "titleStyle"
},
{
"role": "body",
"format": "markdown",
"text": "Here's a [hyperlink](http://sketchytech.blogspot.co.uk).",
"textStyle": "bodyStyle"
}
],
"componentTextStyles": {
"titleStyle": {
"textAlignment": "center",
"fontName": "HelveticaNeue-Bold",
"fontSize": 64,
"lineHeight": 74,
"textColor": "#000"
},
"bodyStyle": {
"textAlignment": "left",
"fontName": "Georgia",
"fontSize": 18,
"lineHeight": 26,
"textColor": "#000"
}
}
}
but in Apple News Format there is also a Link Addition type, which I presume should work in a similar way to inline text styles, which are placed inside a component like this:
{
"role": "title",
"text": "My First Article",
"textStyle": "titleStyle",
"inlineTextStyles": [{
"rangeStart": 3,
"rangeLength": 5,
"textStyle": {
"textColor": "#FF00007F"
}
}
}
Apple provides sample code:
{
"type": "link",
"URL": "http://www.apple.com",
"rangeStart": 0,
"rangeLength": 20
}
But it doesn't give any instructions on where it should be placed like the other elements do. It is also rather mysterious that it has a "type" key, which is unlike other elements. Not only this but in the type description it is described as a LinkAddition in all uppercase. I have tried various combinations, the most obvious of which I would guess to be
"linkAdditions": [{
"type": "link",
"URL": "http://www.apple.com",
"rangeStart": 0,
"rangeLength": 20
}]
added to a component in the same way as inlineTextStyles (because a block of text could have multiple links, just as it can have multiple styles) but I can't get this or any variants that I've tried to work. Is it perhaps that News Preview isn't yet capable of rendering this?
Add it under the "additions" property instead of under "linkAdditions" inside of the component as you expected.
For example, this should work:
...
"role": "body",
"text": "Article text goes here and here",
"additions": [{
"type": "link",
"URL": "http://www.apple.com",
"rangeStart": 0,
"rangeLength": 20
}],
...
Note: if the format is markdown it will ignore the additions property.
I'm trying to use Spearmint, the Bayesian optimization library, to tune hyperparameters for a machine learning classifier. My question is how does one express parameter search spaces that does not follow a uniform distribution?
From the project's github page, here is an example of how to set two uniformly distributed parameter search spaces:
"variables": {
"X": {
"type": "FLOAT",
"size": 1,
"min": -5,
"max": 10
},
"Y": {
"type": "FLOAT",
"size": 1,
"min": 0,
"max": 15
}
}
How would we define a search space like the one below in Spearmint?
SVC_PARAMS = [
{
"bounds": {
"max": 10.0,
"min": 0.01,
},
"name": "C",
"type": "double",
"transformation": "log",
},
{
"bounds": {
"max": 1.0,
"min": 0.0001,
},
"name": "gamma",
"type": "double",
"transformation": "log",
},
{
"type": "categorical",
"name": "kernel",
"categorical_values": [
{"name": "rbf"},
{"name": "poly"},
{"name": "sigmoid"},
],
},
]
Is there a place to look up all of the stochastic expressions (ie uniform, normal, log etc) currently being supported by Spearmint?
Spearmint learns these kinds of transformations automatically from the data. If you take a look here: https://github.com/HIPS/Spearmint/tree/master/spearmint/transformations
you can see the implementation of the beta warping that is applied (detailed in this paper: http://arxiv.org/abs/1402.0929). Spearmint doesn't have a way to specify these a-priori, but you could have Spearmint operate on e.g. the log of the parameters (by giving the log of the parameter ranges and exponentiating on your end).