NODE-Red parse data from webpage - parsing

I need to get data from online html table, parse it and find some values in it. The needed data is in class <table class="zjrtbl" border="0">. This is the page I want to parse.It is an timetable of local bus stop.
How do I get this table to some variable to work with?
How do I parse the data so I will have let say 2D array of this table?
EDIT 2:
I have now this setup:
[{"id":"a9fffc.914a1008","type":"inject","z":"2988145.2ee976c","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"x":120,"y":140,"wires":[["889103ea.58886"]]},{"id":"889103ea.58886","type":"http request","z":"2988145.2ee976c","name":"","method":"GET","ret":"txt","url":"http://jizdnirady.idnes.cz/ceskebudejovice/zjr/?date=9.12.2016%20P%C3%A1&l=Trol%205&f=Strakonick%C3%A1%20-%20obchodn%C3%AD%20z%C3%B3na&t=Ro%C5%BEnov%20-%20to%C4%8Dna&wholeweek=true&ttn=CesBud&submit=true","tls":"","x":290,"y":140,"wires":[["9e5d61b1.d8747"]]},{"id":"9e5d61b1.d8747","type":"html","z":"2988145.2ee976c","name":"","tag":".zjrtbl","ret":"text","as":"single","x":430,"y":140,"wires":[["2ff681d4.5dcade"]]},{"id":"db94b76f.32ea58","type":"http in","z":"2988145.2ee976c","name":"","url":"/idos","method":"get","swaggerDoc":"","x":120,"y":100,"wires":[["889103ea.58886"]]},{"id":"5c2e34b0.692dbc","type":"http response","z":"2988145.2ee976c","name":"http","x":1170,"y":140,"wires":[]},{"id":"a9aa5336.dbaaa","type":"function","z":"2988145.2ee976c","name":"connector","func":"msg.payload = msg.payload;\nreturn msg;","outputs":1,"noerr":0,"x":840,"y":140,"wires":[["d8ed8e61.482"]]},{"id":"65013729.cd6df8","type":"function","z":"2988145.2ee976c","name":"split to array","func":"var arr = msg.payload.replace(/\\s+/g, ' ').split(' ');\nmsg.arr = arr;\nreturn msg;","outputs":1,"noerr":0,"x":690,"y":140,"wires":[["a9aa5336.dbaaa"]]},{"id":"daa81b7.bdcc1e8","type":"debug","z":"2988145.2ee976c","name":"payload","active":true,"console":"false","complete":"payload","x":1180,"y":180,"wires":[]},{"id":"2ff681d4.5dcade","type":"split","z":"2988145.2ee976c","name":"","splt":"","x":550,"y":140,"wires":[["65013729.cd6df8"]]},{"id":"d8ed8e61.482","type":"function","z":"2988145.2ee976c","name":"assemble array","func":"msg.payload = \"\";\nfor (var i = 0; i < msg.arr.length; i++) {\n msg.payload += \"[\" + msg.arr[i] + \"]\";\n}\n\nmsg.statusCode = 200;\nreturn msg;","outputs":1,"noerr":0,"x":1000,"y":140,"wires":[["5c2e34b0.692dbc","daa81b7.bdcc1e8"]]}]
and it looks good now, but there is one more glitch... it does not separate hours from minutes...

HTML is notoriously difficult to parse, it's not always proper XML so things like XPath tend to fail.
The HTML node allows you to use CSS style selectors to grab bits of webpages. So something like this may get you closer.
[{"id":"b5b2b310.4dfc5","type":"inject","z":"59370ac1.51144c","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"x":138.5,"y":120,"wires":[["d7e0478a.997b8"]]},{"id":"d7e0478a.997b8","type":"http request","z":"59370ac1.51144c","name":"","method":"GET","ret":"txt","url":"http://jizdnirady.idnes.cz/ceskebudejovice/zjr/?date=9.12.2016%20P%C3%A1&l=Trol%205&f=Strakonick%C3%A1%20-%20obchodn%C3%AD%20z%C3%B3na&t=Ro%C5%BEnov%20-%20to%C4%8Dna&wholeweek=true&ttn=CesBud&submit=true","tls":"","x":332.5,"y":141,"wires":[["a8ff0654.6ac8d"]]},{"id":"a8ff0654.6ac8d","type":"html","z":"59370ac1.51144c","name":"","tag":".zjrtbl","ret":"html","as":"single","x":530.5,"y":146,"wires":[["984182c2.bb29e8"]]},{"id":"984182c2.bb29e8","type":"debug","z":"59370ac1.51144c","name":"","active":true,"console":"false","complete":"false","x":752.5,"y":205,"wires":[]}]
This will pull out just the table, you could then use another HTML node to pull out other parts.

Related

Using Insert with a large multi-layered table using Lua

So I am working on a script for GTA5 and I need to transfer data over to a js script. However so I don't need to send multiple arrays to js I require a table, the template for the table should appear as below.
The issue I'm having at the moment is in the second section where I receive all vehicles and loop through each to add it to said 'vehicleTable'. I haven't been able to find the "table.insert" method used in a multilayered table
So far I've tried the following
table.insert(vehicleTable,vehicleTable[class][i][vehicleName])
This seems to store an 'object'(table)? so it does not show up when called in the latter for loop
Next,
vehicleTable = vehicleTable + vehicleTable[class][i][vehicleName]
This seemed like it was going nowhere as I either got a error or nothing happened.
Next,
table.insert(vehicleTable,class)
table.insert(vehicleTable[class],i)
table.insert(vehicleTable[class][i],vehicleName)
This one failed on the second line, I'm unsure why however it didn't even reach the next problem I saw later which would be the fact that line 3 had no way to specify the "Name" field.
Lastly the current one,
local test = {[class] = {[i]={["Name"]=vehicleName}}}
table.insert(vehicleTable,test)
It works without errors but ultimately it doesn't file it in the table instead it seems to create its own branch so object within the object.
And after about 3 hours of zero progress on this topic I turn to the stack overflow for assistance.
local vehicleTable = {
["Sports"] = {
[1] = {["Name"] = "ASS", ["Hash"] = "Asshole2"},
[2] = {["Name"] = "ASS2", ["Hash"] = "Asshole1"}
},
["Muscle"] = {
[1] = {["Name"] = "Sedi", ["Hash"] = "Sedina5"}
},
["Compacts"] = {
[1] = {["Name"] = "MuscleCar", ["Hash"] = "MCar2"}
},
["Sedan"] = {
[1] = {["Name"] = "Blowthing", ["Hash"] = "Blowthing887"}
}
}
local vehicles = GetAllVehicleModels();
for i=1, #vehicles do
local class = vehicleClasses[GetVehicleClassFromName(vehicles[i])]
local vehicleName = GetLabelText(GetDisplayNameFromVehicleModel(vehicles[i]))
print(vehicles[i].. " " .. class .. " " .. vehicleName)
local test = {[class] = {[i]={["Name"]=vehicleName}}}
table.insert(vehicleTable,test)
end
for k in pairs(vehicleTable) do
print(k)
-- for v in pairs(vehicleTable[k]) do
-- print(v .. " " .. #vehicleTable[k])
-- end
end
If there is not way to add to a library / table how would I go about sorting all this without needing to send a million (hash, name, etc...) requests to js?
Any recommendations or support would be much appreciated.
Aside the fact that you do not provide the definition of multiple functions and tables used in your code that would be necessary to provide a complete answere without making assumptions there are many misconceptions regarding very basic topics in Lua.
The most prominent is that you don't know how to use table.insert and what it can do. It will insert (append by default) a numeric field to a table. Given that you have non-numeric keys in your vehicleTable this doesn't make too much sense.
You also don't know how to use the + operator and that it does not make any sense to add a table and a string.
Most of your code seems to be the result of guess work and trial and error.
Instead of referring to the Lua manual so you know how to use table.insert and how to index tables properly you spend 3 hours trying all kinds of variations of your incorrect code.
Assuming a vehicle model is a table like {["Name"] = "MyCar", ["Hash"] = "MyCarHash"} you can add it to a vehicle class like so:
table.insert(vehicleTable["Sedan"], {["Name"] = "MyCar", ["Hash"] = "MyCarHash"})
This makes sense because vehicleTable.Sedan has numeric indices. And after that line it would contain 2 cars.
Read the manual. Then revisit your code and fix your errors.

ImageJ/Fiji - Save CSV using macro

I am not a coder but trying to turn ThunderSTORM's batch process into an automated one where I have a single input folder and a single output folder.
input_directory = newArray("C:\\Users\\me\\Desktop\\Images");
output_directory = ("C:\\Users\\me\\Desktop\\Results");
for(i = 0; i < input_directory.length; i++) {
open(input_directory[i]);
originalName = getTitle();
originalNameWithoutExt = replace( originalName , ".tif" , "" );
fileName = originalNameWithoutExt;
run("Run analysis", "filter=[Wavelet filter (B-Spline)] scale=2.0 order=3 detector "+
"detector=[Local maximum] connectivity=8-neighbourhood threshold=std(Wave.F1) "+
"estimator=[PSF: Integrated Gaussian] sigma=1.6 method=[Weighted Least squares] fitradius=3 mfaenabled=false "+
"renderer=[Averaged shifted histograms] magnification=5.0 colorizez=true shifts=2 "+
"repaint=50 threed=false");
saveAs(fileName+"_Results", output_directory);
}
This probably looks like a huge mess but the original batch file used arrays and I can't figure out what that is. Taking it out brakes it so I left it in. The main issues I have revolve around the saveAs part not working.
Using run("Export Results") works but I need to manually pick a location and file name. I tried to set this up to take the file name and rename it to the generic image name so it can save a CSV using that name.
Any help pointing out why I'm a moron? I would also love to only open one file at a time (this opens them all) and close it when the analysis is complete. But I will settle for that happening on a different day if I can just manage to save the damn CSV automatically.
For the most part, I broke the code a whole bunch of times but it's in a working condition like this.
I appreciate any and all help. Thank you!

Read data from ODataModel in loop to calculate sum

I need to read data from ODataModel in loop to calculate values. So oData.Read() is not good for me as it will call Asynchronously and will call another method. I want to loop as like looping in Array and probably oDataModel.getProperty() can help me. I am executing below code in Chrome Console and getting below result.
m1 = this.getView().getModel("Model Name");
m1.getProperty("/")
Result is:
Object {SEARCH('61451144935589051'): Object, SEARCH('61451144935589052'): Object, SEARCH('61451144935589053'): Object, SEARCH('61451144935589054'): Object, SEARCH('61451144935589055'): Object…}
However if I try with below code then getting undefined as output.
m1.getProperty("/SEARCH")
It is absolutely correct that you get undefined. Obviously you have an entity type SEARCH with a single key and your model stores several entities of this entity type.
You can grab all data stored in your model and process it like in the appended code example. However this strongly not recommended as you put to much logic to the client. A better approach would be a function or even an extra entity at your OData service.
var data, i, name, names, sum;
data = m1.getProperty("/");
names = Object.getOwnPropertyNames(data);
sum = 0;
for (i = 0; i < names.length; i += 1) {
name = names[i];
// you have to check for the correct entity
if (/SEARCH/.test(name )) {
sum += data[name].value;
}
}

Where can I find a large tabbed hierarchical data set for parser testing?

First, apologies as I realize this is only tangentially related to parser programming.
I've spend hours looking for a text file containing something like the following but with hundreds (hopefully thousands) of sub-entries. A complete biological classification file would be perfect. A massive version of the following would be great as my parser parses simple tabbed files:
TL,DR - I need a massive single-file hierarchical data set something like the following:
Kindoms
Monera
Protista
Fungi
Plants
Animals
Porifera
Sponges
Coelenterates
Hydra
Coral
Jellyfish
Platyhelminthes
Flatworms
Flukes
Nematodes
Roundworms
Tapeworms
Chordates
Urochordataes
Cephalochordates
Vertebrates
Fish
Amphibians
Reptiles
Birds
Mammals
The best I've been able to find are tree-of-life images (from which I transcribed the sample data set above). A single file with a TON of real data would be awesome. It doesn't have to be a biological classification data set, but I would really like the data to reflect something in the real-world. (My parser feeds a menu - would be great if the remainder of my testing was with a data set that actually meant something!) Even if the file is not tabbed but the data was fairly easily regex'ed to a tabbed format... that would be great.
Any ideas? Thanks!
It is possible that the xml layout was changed since the last answer but the code submitted above is no longer accurate. The resulting dump is extraneous. Some of the nodes have aliases (denoted as 'othername') that are reported as distinct nodes themselves.
I used the script below to generate the correct dump.
<?php
$reader = new XMLReader();
$reader->open('http://tolweb.org/onlinecontributors/app?service=external&page=xml/TreeStructureService&node_id=1'); //15963 is the primates index
$set=-1;
while ($reader->read()) {
switch ($reader->nodeType) {
case (XMLREADER::ELEMENT):
if ($reader->name == "OTHERNAMES"){
$set=1;
}
if ($reader->name == "NODES"){
$set=-1;
}
if ($reader->name == "NODE"){
$set=-1;
}
if ($reader->name == "NAME" AND $set == -1){
echo str_repeat("\t", $reader->depth - 2); //repeat tabs for depth
$node = $reader->expand();
echo $node->textContent . "\n";
}
break;
}
}
?>
This turned out to be such a pain in the ass. I finally tracked down a data feed from "The Tree of Life Web Project" at tolweb.org. I made the php script below to provide the basic functionality my post was looking for.
Change the node_id to have it print a tabbed representation of any of tolweb.org's data - just take the id from the page you're browsing on their site and change the node_id below.
Be aware though - their data feeds serve up large files, so definitely download the file to your own server (and change the "open" method below to point to the local file) if you're going to hit it more than once or twice.
More info on tolweb.org data feeds can be found here:
http://tolweb.org/tree/home.pages/downloadtree.html
<?php
$reader = new XMLReader();
$reader->open('http://tolweb.org/onlinecontributors/app?service=external&page=xml/TreeStructureService&node_id=15963'); //15963 is the primates index
while ($reader->read()) {
switch ($reader->nodeType) {
case (XMLREADER::ELEMENT):
if ($reader->name == "NAME"){
echo str_repeat("\t", $reader->depth - 2); //repeat tabs for depth
$node = $reader->expand();
echo $node->textContent . "\n";
}
break;
}
}
?>

DBF Large Char Field

I have a database file that I beleive was created with Clipper but can't say for sure (I have .ntx files for indexes which I understand is what Clipper uses). I am trying to create a C# application that will read this database using the System.Data.OleDB namespace.
For the most part I can sucessfully read the contents of the tables there is one field that I cannot. This field called CTRLNUMS that is defined as a CHAR(750). I have read various articles found through Google searches that suggest field larger than 255 chars have to be read through a different process than the normal assignment to a string variable. So far I have not been successful in an approach that I have found.
The following is a sample code snippet I am using to read the table and includes two options I used to read the CTRLNUMS field. Both options resulted in 238 characters being returned even though there is 750 characters stored in the field.
Here is my connection string:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\datadir;Extended Properties=DBASE IV;
Can anyone tell me the secret to reading larger fields from a DBF file?
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
using (OleDbCommand cmd = new OleDbCommand())
{
cmd.Connection = conn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = string.Format("SELECT ITEM,CTRLNUMS FROM STUFF WHERE ITEM = '{0}'", stuffId);
using (OleDbDataReader dr = cmd.ExecuteReader())
{
if (dr.Read())
{
stuff.StuffId = dr["ITEM"].ToString();
// OPTION 1
string ctrlNums = dr["CTRLNUMS"].ToString();
// OPTION 2
char[] buffer = new char[750];
int index = 0;
int readSize = 5;
while (index < 750)
{
long charsRead = dr.GetChars(dr.GetOrdinal("CTRLNUMS"), index, buffer, index, readSize);
index += (int)charsRead;
if (charsRead < readSize)
{
break;
}
}
}
}
}
}
You can find a description of the DBF structure here: http://www.dbf2002.com/dbf-file-format.html
What I think Clipper used to do was modify the Field structure so that, in Character fields, the Decimal Places held the high-order byte of the size, so Character field sizes were really 256*Decimals+Size.
I may have a C# class that reads dbfs (natively, not ADO/DAO), it could be modified to handle this case. Let me know if you're interested.
Are you still looking for an answer? Is this a one-off job or something that needs doing regularly?
I have a Python module that is primarily intended to extract data from all kinds of DBF files ... it doesn't yet handle the length_high_byte = decimal_places hack, but it's a trivial change. I'd be quite happy to (a) share this with you and/or (b) get a copy of such a DBF file for testing.
Added later: Extended-length feature added, and tested against files I've created myself. Offer to share code with anyone who would like to test it still stands. Still interested in getting some "real" files myself for testing.
3 suggestions that might be worth a shot...
1 - use Access to create a linked table to the DBF file, then use .Net to hit the table in the access database instead of going direct to the DBF.
2 - try the FoxPro OLEDB provider
3 - parse the DBF file by hand. Example is here.
My guess is that #1 should work the easiest, and #3 will give you the opportunity to fine tune your cussing skills. :)

Resources