Count of the biggest bin in histogram, C#, sharp - histogram

I want to make histogram of my data so, I use histogram class at c# using MathNet.Numerics.Statistics.
double[] array = { 2, 2, 5,56,78,97,3,3,5,23,34,67,12,45,65 };
Vector<double> data = Vector<double>.Build.DenseOfArray(array);
int binAmount = 3;
Histogram _currentHistogram = new Histogram(data, binAmount);
How can I get the count of the biggest bin? Or just the index of the bigest bin? I try to get it by using GetBucketOf but to do this I need the element in this bucket :(
Is there any other way to do this? I read the documentation and Google and I can't find anything.

(Hi, I would use a comment for this but i just joined so today and don't yet have 50 reputation to comment!) I just had a look at - http://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Histogram.htm. That documentation page (footer says it was built using http://docu.jagregory.com/) shows a public property named Item which returns a Bucket. I'm wondering if that is the property you need to use because the automatically generated documentation states that the Item property "Gets' the n'th bucket" but isn't clear how the Item property acts as an indexer. Looking at your code i would try _currentHistogram.Item[n] first (if that doesn't work try _currentHistogram[n]) where you are iterating the Buckets in the histogram using something like -
var countOfBiggest = -1;
var indexOfBiggest = -1;
for (var n = 0; n < _currentHistogram.BucketCount; n++)
{
if (_currentHistogram.Item[n].Count > countOfBiggest)
{
countOfBiggest = _currentHistogram.Item[n].Count;
indexOfBiggest = n;
}
}
The code above assumes that Histogram uses 0-based and not 1-based indexing.

Related

Highlight near duplicate in conditional formating to highlight values with one character difference

I'm currently using this formula to highlight duplicates in my spreadsheet.
=ARRAYFORMULA(COUNTIF(A$2:$A2,$A2)>1)
Quite simple, it allows me to skip the first occurrence and only highlight 2nd, 3rd, ... occurrences.
I would like the formula to go a bit further and highlight near duplicates as well.
Meaning if there is only one character difference between 2 cells, then it should be considered as a duplicate.
For instance: "Marketing", "Marketng", "Marketingg" and "Market ing" would all be considered the same.
I've made a sample sheet in case my requirement is not straightforward to understand.
Thanks in advance.
Answer
Unfortunately, it is not possible to do this only through Formulas. Apps Scripts are need as well. The process for achieving your desired results is described below.
In Google Sheets, go to Extensions > Apps Script, paste the following code1 and save.
function TypoFinder(range, word) { // created by https://stackoverflow.com/users/19361936
if (!Array.isArray(range) || word == "") {
return false;
}
distances = range.map(row => row.map(cell => Levenshtein(cell, word))) // Iterate over range and check Levenshtein distance.
var accumulator = 0;
for (var i = 0; i < distances.length; i++) {
if (distances[i] < 2) {
accumulator++
} // Keep track of how many times there's a Levenshtein distance of 0 or 1.
}
return accumulator > 1;
}
function Levenshtein(a, b) { // created by https://stackoverflow.com/users/4269081
if (a.length == 0) return b.length;
if (b.length == 0) return a.length;
// swap to save some memory O(min(a,b)) instead of O(a)
if (a.length > b.length) {
var tmp = a;
a = b;
b = tmp;
}
var row = [];
// init the row
for (var i = 0; i <= a.length; i++) {
row[i] = i;
}
// fill in the rest
for (var i = 0; i < b.length; i++) {
var prev = i;
for (var j = 0; j < a.length; j++) {
var val;
if (b.charAt(i) == a.charAt(j)) {
val = row[j]; // match
} else {
val = Math.min(row[j] + 1, // substitution
prev + 1, // insertion
row[j + 1] + 1); // deletion
}
row[j] = prev;
prev = val;
}
row[a.length] = prev;
}
return row[a.length];
}
In cell B1, enter =TypoFinder($A$2:$A2,$A2). Autofill that formula down the column by draggin.
Create a conditional formatting rule for column A. Using Format Rules > Custom Formula, enter =B2:B.
At this point, you might wish to hide column B. To do so, right click on the column and press Hide Column.
The above explanation assumes the column you wish to highlight is Column A and the helper column is column B. Adjust appropriately.
Note that I have assumed you do not wish to highlight repeated blank columns as duplicate. If I am incorrect, remove || word == "" from line 2 of the provided snippet.
Explanation
The concept you have described is called Levenshtein Distance, which is a measure of how close together two strings are. There is no built-in way for Google Sheets to process this, so the Levenshtein() portion of the snippet above implements a custom function to do so instead. Then the TypoFinder() function is built on top of it, providing a method for evaluating a range of data against a specified "correct" word (looking for typos anywhere in the range).
Next, a helper column is used because Sheets has difficulties parsing custom formulas as part of a conditional formatting rule. Finally, the rule itself is implemented to check the helper column's determination of whether the row should be highlighted or not. Altogether, this highlights near-duplicate results in a specified column.
1 Adapted from duality's answer to a related question.

Parsing with Awesomium

the awesomium answering forums seem pretty much dead, so I'm reposting this here
First of all, before starting to learn Awesomium I used the HtmlAgilityPack library for all my parsing needs, but the library is not being updated anymore and I decided to move to Awesomium. (so my approach is based on my experience with HAP)
I figured out how to parse lists of objects with Awesomium, but I can't figure out how to work with them. For example:
public dynamic FindNodes(string xpath, dynamic node = null, WebView wv = null)
{
if (wv == null) wv = mainView;
dynamic nodes = (JSObject)wv.ExecuteJavascriptWithResult(String.Format("document.evaluate(\"{0}\", {1}, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null)", xpath, "document")));
int length = nodes.snapshotLength;
for (int i = 0; i < length; i++)
{
Console.WriteLine(nodes.snapshotItem(i).innerText);
}
return nodes;
}
The problems start after I return the nodes. I want to perform a series of searches for each node, so after returning them I decided that the following should work:
dynamic weakCounters = ap.FindNodes("//div[#id='weaklist']/ul/li");
for (int i = 0; i < weakCounters.snapshotLength; i++)
{
ap.FindNodes("//h3[#class='black']", weakCounters.snapshotItem(i));
}
But it did not. The part where I'm trying to get the length of the list and of course, if I try to get a snapshot of the item directly I get an error.
I understand, that I'm making a HUGE mistake somewhere. I just can't understand where.
Edit: Surprisingly if I do the following, everything seems fine, but it just doesn't look right to create a new variable everytime I need to access it (that's just bananas)
dynamic weakCounters = ap.FindNodes("//div[#id='weaklist']/ul/li");
dynamic nodes = weakCounters;
for (int i = 0; i < nodes.snapshotLength; i++)
{
Also, how can I pass the result (element) that I have extracted back to awesomium so that I could do a "subsearch" ?
cross-posted answer from http://answers.awesomium.com/questions/4276/parsing-with-awesomium.html
Why do you need Awesomium for HTML parsing? What's wrong with
HtmlAgilityPack?
Download page with Awesomium (if that is why you need it), get HTML,
parse it with HtmlAgilityPack.
Parsing like this should be very slow (if it return many elements).

Twitter JSON losing quotes around properties?

Here's part of my JSON:
[
UserJSONImpl{
id=1489761876,
name='CharlesPerin',
screenName='charles_perin',
location='Paris,
France',
description='PhdStudentatINRIA-Univ.Paris-Sud-CNRS-LIMSI#infovis#dataviz#hci',
isContributorsEnabled=false,
profileImageUrl='http: //a0.twimg.com/profile_images/3766400220/bbced44afe69e60eb30e00f593a2f3b5_normal.jpeg',
profileImageUrlHttps='https: //si0.twimg.com/profile_images/3766400220/bbced44afe69e60eb30e00f593a2f3b5_normal.jpeg',
url='http: //t.co/eYSy04EzEk',
isProtected=false,
},
UserJSONImpl{
id=19671465,
name='KevinQuealy',
screenName='KevinQ',
location='NewYork,
NY',
description='AgraphicseditorattheNewYorkTimes.AdjunctatNYU#SHERP.ReturnedPeaceCorpsvolunteer.Bald,
Minnesotan,
talkstoomuch.',
isContributorsEnabled=false,
profileImageUrl='http: //a0.twimg.com/profile_images/2213326305/image_normal.jpg',
profileImageUrlHttps='https: //si0.twimg.com/profile_images/2213326305/image_normal.jpg',
url='http: //t.co/vb0j99kE3N',
isProtected=false,
...(cont)
This was returned directly from a call to twitter4j's lookupUsers:
long[] hundredIDs = new long[100];
org.json.JSONArray users = new org.json.JSONArray();
for(int a = 0; a < (int)((double)friendArray.length()/100 +1); a++)
{
for(int j = 100*a; j < 100*(a+1); j++)
{
hundredIDs[j-100*a] = Long.parseLong(friendArray.getString(j));
}
users = new org.json.JSONArray(twitter.lookupUsers(hundredIDs)); //lookup users in batches of 100
for(int k = 0; k < users.length(); k++)
{
org.json.JSONObject user = users.getJSONObject(k);
if(Long.parseLong(user.getString("followers_count")) >= 500)
{
String id = user.getString("id"); //get id for each JSONObject
friendArrayFiltered.add(id); //store ids in another array
}
}
For some reason, the JSON returned by my code doesn't have the standard quotes around the properties ("id"= ...., rather than id =...). It doesn't seem to be a problem of the Twitter API itself since their examples are in the correct format: https://dev.twitter.com/docs/api/1/get/users/lookup.
Does anyone know what the problem is?
Also, not sure if this is a consequence but when I attempt to access individual elements of the JSONArray (like JSONArray[0]), an error is returned saying JSONArray[0] is not a JSONObject. Is this linked to the above problem?
It's not JSON, it's actually generated by the UserJSONImpl#toString() method which is providing a textual representation for each of the User objects returned by the lookupUsers invocation.
As for your second problem, you cannot use the [] operator on Object types in Java so I'm a little unclear what you mean without further information.
ASIDE
I'm not sure why you are wrapping twitter4j objects in JSONArray and JSONObject objects - of course you may have a good reason for doing this that's not apparent in the question - but you can simply use the methods directly on the returned objects to get the information you need, for example:
final List<User> users = twitter.lookupUsers(hundredIDs);
for (User user : users) {
final int followersCount = user.getFollowersCount();
if (followersCount > 500) {
... etc...
Check out the User JavaDocs and wider documentation for the project.

Why can't I use a map's value without having to use a temporary variable?

Ok so this is my scenario:
rascal>map[int, list[int]] g = ();
rascal>g += (1:[2]);
This will result in:
rascal>g[1];
list[int]: [2]
So far so good, but now I wanted to do this, but it didn't work:
rascal>g[1] += 3;
|stdin:///|(2,1,<1,2>,<1,3>): insert into collection not supported on value and int
So I can't directly use the value from g[1] and will have to use a temporary variable like this:
rascal>lst = g[1];
rascal>lst += 3;
rascal>g[1] = lst;
map[int, list[int]]: (1:[2,3])
But doing this everytime I want to extent my list is a drag!
Am I doing something wrong or would this be an awesome feature?
Richard
Good question! + on lists is concatenation not insert, so you could type the following to get the desired effect:
g[1] += [2];

OpenCV Hough strongest lines

Do the HoughLines or HoughLinesP functions in OpenCV return the list of lines in accumulator order like the HoughCircles function does? I would like to know the ordering of lines. It would also be very handy to get a the accumulator value for the lines so an intelligent and adaptive threshold could be used instead of a fixed one. Are either the ordering or the accumulator value available without rewriting OpenCV myself?
HoughTransform orders lines descending by number of votes. You can see the code here
However, the vote count is lost as the function returns - the only way to have it is to modify OpenCV.
The good news is that is not very complicated - I did it myself once. It's a metter of minutes to change the output from vector< Vec2f > to vector< Vec3f > and populate the last param with vote count.
Also, you have to modify CvLinePolar to add the third parameter - hough is implemented in C, and there is a wrapper over it in C++, so you have to modify both the implementation and the wrapper.
The main code to modify is here
for( i = 0; i < linesMax; i++ )
{
CvLinePolar line;
int idx = sort_buf[i];
int n = cvFloor(idx*scale) - 1;
int r = idx - (n+1)*(numrho+2) - 1;
line.rho = (r - (numrho - 1)*0.5f) * rho;
line.angle = n * theta;
// add this line, and a field voteCount to CvLinePolar
// DO NOT FORGET TO MODIFY THE C++ WRAPPER
line.voteCount = accum[idx];
cvSeqPush( lines, &line );
}

Resources