Parsing with Awesomium - parsing

the awesomium answering forums seem pretty much dead, so I'm reposting this here
First of all, before starting to learn Awesomium I used the HtmlAgilityPack library for all my parsing needs, but the library is not being updated anymore and I decided to move to Awesomium. (so my approach is based on my experience with HAP)
I figured out how to parse lists of objects with Awesomium, but I can't figure out how to work with them. For example:
public dynamic FindNodes(string xpath, dynamic node = null, WebView wv = null)
{
if (wv == null) wv = mainView;
dynamic nodes = (JSObject)wv.ExecuteJavascriptWithResult(String.Format("document.evaluate(\"{0}\", {1}, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null)", xpath, "document")));
int length = nodes.snapshotLength;
for (int i = 0; i < length; i++)
{
Console.WriteLine(nodes.snapshotItem(i).innerText);
}
return nodes;
}
The problems start after I return the nodes. I want to perform a series of searches for each node, so after returning them I decided that the following should work:
dynamic weakCounters = ap.FindNodes("//div[#id='weaklist']/ul/li");
for (int i = 0; i < weakCounters.snapshotLength; i++)
{
ap.FindNodes("//h3[#class='black']", weakCounters.snapshotItem(i));
}
But it did not. The part where I'm trying to get the length of the list and of course, if I try to get a snapshot of the item directly I get an error.
I understand, that I'm making a HUGE mistake somewhere. I just can't understand where.
Edit: Surprisingly if I do the following, everything seems fine, but it just doesn't look right to create a new variable everytime I need to access it (that's just bananas)
dynamic weakCounters = ap.FindNodes("//div[#id='weaklist']/ul/li");
dynamic nodes = weakCounters;
for (int i = 0; i < nodes.snapshotLength; i++)
{
Also, how can I pass the result (element) that I have extracted back to awesomium so that I could do a "subsearch" ?

cross-posted answer from http://answers.awesomium.com/questions/4276/parsing-with-awesomium.html
Why do you need Awesomium for HTML parsing? What's wrong with
HtmlAgilityPack?
Download page with Awesomium (if that is why you need it), get HTML,
parse it with HtmlAgilityPack.
Parsing like this should be very slow (if it return many elements).

Related

Script fails when running normally but in debug its fine

I'm developing a google spreadsheet that is automatically requesting information from a site, below is the code. The variable 'tokens' is an array consisting of about 60 different 3 letter unique identifiers. The problem that i have been getting is that the code keeps failing to request all information on the site. Instead it falls back (at random) on the validation part, and fills the array up with "Error!" strings. Sometimes its row 5, then 10-12, then 3, then multiple rows, etc. When i run it in debug mode everythings fine, can't seem to be able to reproduce the problem.
Already tried to place a sleep (100ms) but that fixed nothing. Also looked at the amount of traffic the API accepts (10 requests per second, 1.200 per minute, 100.000 per day) , it shouldn't be a problem.
Runtime is limited so i need it to be as efficient as possible. I'm thinking it is an issue of computational power after i pushed all values in the json request into the 'tokens' array. Is there a way to let the script wait as long as necessary for the changes to be committed?
function newGetOrders() {
var starttime = new Date().getTime().toString();
var refreshTime = new Date();
var tokens = retrieveTopBin();
var sheet = SpreadsheetApp.openById('aaafFzbXXRzSi-eXBu9Xh81Ne2r09vM8rLFkA4fY').getSheetByName("Sheet37");
sheet.getRange('A2:OL101').clear();
for (var i=0; i<tokens.length; i++) {
var request = UrlFetchApp.fetch("https://api.binance.com/api/v1/depth?symbol=" + tokens[i][0] + "BTC", {muteHttpExceptions:true});
var json = JSON.parse(request.getContentText());
tokens[i].push(refreshTime);
Utilities.sleep(100);
for (var k in json.bids) {
tokens[i].push(json.bids[k][0]);
tokens[i].push(json.bids[k][1]);
}
for (var k in json.asks) {
tokens[i].push(json.asks[k][0]);
tokens[i].push(json.asks[k][1]);
}
if (tokens[i].length < 402) {
for (var x=tokens[i].length; x<402; x++) {
tokens[i].push("ERROR!");
}
}
}
sheet.getRange(2, 1, tokens.length, 402).setValues(tokens);
}

Count of the biggest bin in histogram, C#, sharp

I want to make histogram of my data so, I use histogram class at c# using MathNet.Numerics.Statistics.
double[] array = { 2, 2, 5,56,78,97,3,3,5,23,34,67,12,45,65 };
Vector<double> data = Vector<double>.Build.DenseOfArray(array);
int binAmount = 3;
Histogram _currentHistogram = new Histogram(data, binAmount);
How can I get the count of the biggest bin? Or just the index of the bigest bin? I try to get it by using GetBucketOf but to do this I need the element in this bucket :(
Is there any other way to do this? I read the documentation and Google and I can't find anything.
(Hi, I would use a comment for this but i just joined so today and don't yet have 50 reputation to comment!) I just had a look at - http://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Histogram.htm. That documentation page (footer says it was built using http://docu.jagregory.com/) shows a public property named Item which returns a Bucket. I'm wondering if that is the property you need to use because the automatically generated documentation states that the Item property "Gets' the n'th bucket" but isn't clear how the Item property acts as an indexer. Looking at your code i would try _currentHistogram.Item[n] first (if that doesn't work try _currentHistogram[n]) where you are iterating the Buckets in the histogram using something like -
var countOfBiggest = -1;
var indexOfBiggest = -1;
for (var n = 0; n < _currentHistogram.BucketCount; n++)
{
if (_currentHistogram.Item[n].Count > countOfBiggest)
{
countOfBiggest = _currentHistogram.Item[n].Count;
indexOfBiggest = n;
}
}
The code above assumes that Histogram uses 0-based and not 1-based indexing.

Twitter JSON losing quotes around properties?

Here's part of my JSON:
[
UserJSONImpl{
id=1489761876,
name='CharlesPerin',
screenName='charles_perin',
location='Paris,
France',
description='PhdStudentatINRIA-Univ.Paris-Sud-CNRS-LIMSI#infovis#dataviz#hci',
isContributorsEnabled=false,
profileImageUrl='http: //a0.twimg.com/profile_images/3766400220/bbced44afe69e60eb30e00f593a2f3b5_normal.jpeg',
profileImageUrlHttps='https: //si0.twimg.com/profile_images/3766400220/bbced44afe69e60eb30e00f593a2f3b5_normal.jpeg',
url='http: //t.co/eYSy04EzEk',
isProtected=false,
},
UserJSONImpl{
id=19671465,
name='KevinQuealy',
screenName='KevinQ',
location='NewYork,
NY',
description='AgraphicseditorattheNewYorkTimes.AdjunctatNYU#SHERP.ReturnedPeaceCorpsvolunteer.Bald,
Minnesotan,
talkstoomuch.',
isContributorsEnabled=false,
profileImageUrl='http: //a0.twimg.com/profile_images/2213326305/image_normal.jpg',
profileImageUrlHttps='https: //si0.twimg.com/profile_images/2213326305/image_normal.jpg',
url='http: //t.co/vb0j99kE3N',
isProtected=false,
...(cont)
This was returned directly from a call to twitter4j's lookupUsers:
long[] hundredIDs = new long[100];
org.json.JSONArray users = new org.json.JSONArray();
for(int a = 0; a < (int)((double)friendArray.length()/100 +1); a++)
{
for(int j = 100*a; j < 100*(a+1); j++)
{
hundredIDs[j-100*a] = Long.parseLong(friendArray.getString(j));
}
users = new org.json.JSONArray(twitter.lookupUsers(hundredIDs)); //lookup users in batches of 100
for(int k = 0; k < users.length(); k++)
{
org.json.JSONObject user = users.getJSONObject(k);
if(Long.parseLong(user.getString("followers_count")) >= 500)
{
String id = user.getString("id"); //get id for each JSONObject
friendArrayFiltered.add(id); //store ids in another array
}
}
For some reason, the JSON returned by my code doesn't have the standard quotes around the properties ("id"= ...., rather than id =...). It doesn't seem to be a problem of the Twitter API itself since their examples are in the correct format: https://dev.twitter.com/docs/api/1/get/users/lookup.
Does anyone know what the problem is?
Also, not sure if this is a consequence but when I attempt to access individual elements of the JSONArray (like JSONArray[0]), an error is returned saying JSONArray[0] is not a JSONObject. Is this linked to the above problem?
It's not JSON, it's actually generated by the UserJSONImpl#toString() method which is providing a textual representation for each of the User objects returned by the lookupUsers invocation.
As for your second problem, you cannot use the [] operator on Object types in Java so I'm a little unclear what you mean without further information.
ASIDE
I'm not sure why you are wrapping twitter4j objects in JSONArray and JSONObject objects - of course you may have a good reason for doing this that's not apparent in the question - but you can simply use the methods directly on the returned objects to get the information you need, for example:
final List<User> users = twitter.lookupUsers(hundredIDs);
for (User user : users) {
final int followersCount = user.getFollowersCount();
if (followersCount > 500) {
... etc...
Check out the User JavaDocs and wider documentation for the project.

how i can replace text of bookmark when bookmark is in the table?

I use this code to replace text of bookmark in word :
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open("doc3.docx", true))
{
var bookmarkStarts = wordDoc.MainDocumentPart.Document.Body.Descendants<BookmarkStart>();
foreach (var start in bookmarkStarts)
{
OpenXmlElement elem = item.NextSibling();
while (elem != null && !(elem is BookmarkEnd))
{
OpenXmlElement nextElem = elem.NextSibling();
elem.Remove();
elem = nextElem;
}
item.Parent.InsertBefore<Run>(new Run(new Text("Hello")), item);
}
wordDoc.Close();
}
But this not work where the bookmark is in the table.
Have you checked that you don't delete any bookmarks with your approach?
I've run your test code after editing a little (you don't have a var name items in your example code), and I've succesfully inserted Hello in 2 bookmarks out of a table, and 2 in a table, without any issues.
Which leads me to believe your problem lies elsewhere.
Have you looked at the open-xml in your document after you've run your program?
Is there any errors?
I've experienced bookmarks being placed the oddest places in a word-document when you leave the placing to word, and not you.
You can also end up with bookmarks overlapping each other like this
<bookmark1 start><xml xml><bookmark2 start><bookmark1 end><xml xml><bookmark2 end>
If you run into that case, your code will delete the bookmarkstart 2 before it reaches bookmarkend 1, and that will cause your bookmark to not be replaced.
You'll easily run into that problem with larger complex documents.
The way I solved it was to "sort" the bookmarks before doing any editing.
So the example above would become
<bookmark1 start><xml xml><bookmark1 end><bookmark2 start><xml xml><bookmark2 end>
after the sort
The code I use to do this look like this:
var bookmarks = mainPart.Document.Body.Descendants<BookmarkStart>();
for (int i = 0; i < bookmarks.Count(); i++)
{
var bks = bookmarks.ElementAt(i);
var next = bks.NextSibling();
if (next is BookmarkEnd)
{
var bme = (BookmarkEnd)next;
if (int.Parse(bks.Id) - int.Parse(bme.Id) == 1)
{
var copy = (BookmarkEnd)next.Clone();
bks.Parent.RemoveChild<BookmarkEnd>(bme);
bks.Parent.InsertBefore<BookmarkEnd>(copy, bks);
}
}
}
Which i'll admit isn't totally fool-proof but have worked well for me.
Another check you can add, to avoid deleting bookmarks is in your replace method
This will make sure you don't delete bookmarkstarts as you remove elements when inserting text
while (elem != null && !(elem is BookmarkEnd)) //fjern elementer
{
OpenXmlElement nextElem = elem.NextSibling();
if (elem.LocalName != "bookmarkStart")
elem.Remove();
elem = nextElem;
}
Good luck :)

C++ memory issue

I'm currently building a prime number finder, and am having a memory problem:
This may be due to a corruption of the heap, which indicates a bug in PrimeNumbers.exe or any of the DLLs it has loaded.
PS. Please don't say to me if this isn't the way to find prime numbers, I want to figure it out myself!
Code:
// PrimeNumbers.cpp : main project file.
#include "stdafx.h"
#include <vector>
using namespace System;
using namespace std;
int main(array<System::String ^> ^args)
{
Console::WriteLine(L"Until what number do you want to stop?");
signed const int numtstop = Convert::ToInt16(Console::ReadLine());
bool * isvalid = new bool[numtstop];
int allattempts = numtstop*numtstop; // Find all the possible combinations of numbers
for (int currentnumb = 0; currentnumb <= allattempts; currentnumb++) // For each number try to find a combination
{
for (int i = 0; i <= numtstop; i++)
{
for (int tnumb = 0; tnumb <= numtstop; tnumb++)
{
if (i*tnumb == currentnumb)
{
isvalid[currentnumb] = false;
Console::WriteLine("Error");
}
}
}
}
Console::WriteLine(L"\nAll prime number in the range of:" + Convert::ToString(numtstop));
for (int pnts = 0; pnts <= numtstop; pnts++)
{
if (isvalid[pnts] != false)
{
Console::WriteLine(pnts);
}
}
return 0;
}
I don't see the memory problem.
Please help.
You are allocating numtstop booleans, but you index that array using a variable that ranges from zero to numtstop*numtstop. This will be severely out of bounds for all numstop values greater than 1.
You should either allocate more booleans (numtstop*numtstop) or use a different variable to index into isvalid (for example, i, which ranges from 0 to numstop). I am sorry, I cannot be more precise than that because of your request not to comment on your algorithm of finding primes.
P.S. If you would like to read something on the topic of finding small primes, here is a link to a great book by Dijkstra. He teaches you how to construct a program for the first 1000 primes on pages 35..49.
Problem is that you use native C++ in managed C++/CLI code. And use new without delete of course.
`currentnumb` :
is bigger than the size of the array, which is just numtstop. You are probably going out of bound, this might be your issue.
You never delete[] your isvalid local, this is a memory leak.

Resources