Parsing NYC Transit/MTA historical GTFS data (not realtime) - parsing

I've been puzzling on this on and off for months and can't find a solution.
The MTA claims to provide historical data in form of daily dumps in GTFS format here:
[http://web.mta.info/developers/MTA-Subway-Time-historical-data.html][1]
See for yourself by downloading the example they provide, in this case Sep, 17th , 2014:
[https://datamine-history.s3.amazonaws.com/gtfs-2014-09-17-09-31][1]
My problem? The file is gobbledygook. It does not follow GTFS specifications, has no extension, and when I open it using a text editor it looks like 7800 lines of this:
n
^C1.0^X �枪�^Eʞ>`
^C1.0^R^K
^A1^R^F^P����^E^R^K
^A2^R^F^P����^E^R^K
^A3^R^F^P����^E^R^K
^A4^R^F^P����^E^R^K
^A5^R^F^P����^E^R^K
^A6^R^F^P����^E^R^K
^AS^R^F^P����^E^R[
^F000001^ZQ
6
^N050400_1..S02R^Z^H20140917*^A1�>^V
^P01 0824 242/SFY^P^A^X^C^R^W^R^F^Pɚ��^E"^D140Sʚ>^F
^AA^R^AA^RR
^F000002"H
6
Per the MTA site (appears untrue)
All data is formatted in GTFS-realtime
Any idea on the steps necessary to transform this mystery file into usable GTFS data? Is there some encoding I am missing? I have looked for 10+ and been unable to come up with a solution.
Also, not to be a stickler but I am NOT referring to the MTA's realtime data feed, which is correctly formatted and usable. I am specifically referring to the historical data dumps I reference above (have received many "solutions" referring only to realtime data feed)

The file you link to is in GTFS-realtime format, not GTFS, and the page you linked to does a very bad job of explaining which format their data is actually in (though it is mentioned in your quote).
GTFS is used to store schedule data, like routes and scheduled arrival times.
GTFS-realtime is generally used to transfer actual transit performance data in real-time, like vehicle locations and expected or actual arrival times. It is a protobuf, a specification for compiled binary data publicized by Google, which means you can't usefully read it in a text editor, but you instead have to load it programmatically using the Google protobuf tools. It can be used as a historical data format in the way MTA is here, by making daily dumps of the GTFS-rt feed publicly available. It's called GTFS-realtime because various data fields in the realtime like route_id, trip_id, and stop_id are designed to link to the published GTFS schedules.
I confirmed the validity of the data you linked to by decompiling it using the gtfs-realtime.proto specification and the Google protobuf tools for Python. It begins:
header {
gtfs_realtime_version: "1.0"
timestamp: 1410960621
}
entity {
id: "000001"
trip_update {
trip {
trip_id: "050400_1..S02R"
start_date: "20140917"
route_id: "1"
}
stop_time_update {
arrival {
time: 1410960713
}
stop_id: "140S"
}
}
}
...
and continues in that vein for a total of 55833 lines (in the default string output format).
EDIT: the Python script used to convert the protobuf into string representation is very simple:
import gtfs_realtime_pb2 as gtfs_rt
f = open('gtfs-rt.pb', 'rb')
raw_str = f.read()
msg = gtfs_rt.FeedMessage()
msg.ParseFromString(raw_str)
print msg
This requires gtfs-realtime.proto to have been compiled into gtfs_realtime_pb2.py using protoc (following the instructions in the Python protobuf documentation under "Compiling Your Protocol Buffers") and placed in the same directory as the Python script. Furthermore, the binary protobuf downloaded from the MTA needs to be named gtfs-rt.pb and located in the same directory as the Python script.

Related

Problem SamplingRateCalculatorList (00000283DDC3C0E0) : All classes are empty ! OTB + QGis

I use OTB (Orfeo Tool Box) in QGis for classification. When I use the ImageTrainClassifier tool in a batch process, I have a problem for some images. Instead of returning a model in a xml/txt file format, it returns several files with those extensions : .xml_rates_1, .xml_samples_1.dbf, .xml_samples_1.prj, .xml_samples_1.shp, .xml_samples_1.shx, .xml_stats_1 (I have the same files with txt instead of xml if I use txt file format as output).
During the execution of the algorithms, I have only one warning message :
(WARNING): file ..\Modules\Learning\Sampling\src\otbSamplingRateCalculatorList.cxx, line 99, SamplingRateCalculatorList (00000283DDC3C0E0): All classes are empty !
And after that :
(FATAL) TrainImagesClassifier: No samples found in the inputs!
The problem is that after that, I want to use ImageClassifier, that takes the model of ImageTrainClassifier in input, that I don’t have.
Thanks for your help

How to read string stored in hdf5 format files by DM

I am scripting with DM and would like to read hdf5 file format.
I borrowed Tore Niermann's gms_HDF5_Plug-In (hdf5_GMS2X_amd64.dll) and his CMD_import_hdf5.s script. It use h5_read_dataset(filename, datapath) to read a image dataset.
I am trying to figure out the way to read a string info stored in the same file. I am particular interested to read the angle stored in string as shown in this figure.Demonstrated string to read. The h5_read_dataset(filename, datapath) function doesn't work for reading string.
There is a help file (hdf5_plugin.chm) with a list of functions but unfortunately I can't open them to see more info.
hdf5_plugin.chm showing the function list.
I suppose the right function to read strings should be something like h5_read_attr() or h5_info() but I didn't test them out. DM always says the two functions doesn't exist.
After reading out the angle by string, I will also need a bit help to convert the string to a double datatype.
Thank you.
Converting String to Number is done with the Val() command.
There is no integer/double/float concept for variables in DM-script, all are just number. ( This is different for images, where you can define the numeric type. Also: For file-inport/export a type differntiation can be made using the taggroup streaming commands in the other answer. )
Example script:
string numStr = "1.234e-2"
number num = val( numStr )
ClearResults()
Result( "\n As string:" + numStr )
Result( "\n As value:" + num )
Result( "\n As value, formatted:" + Format(num,"%3.2f") )
Potential answer regarding the .chm files: When you download (or email) .chm files in Windows, the OS classifies them as "potentially dagerouse" (because it could contain executable HTML code, I think). As a result, these files can not be shown by default. However, you can right-click these files and "unblock" them in the file properties.
Example:
I think this will be most likely a question specific to that plugin and not general DM scripting. So it might be better to contact the plugin-author directly.
The alternative (not good) solution would be to "rewrite" your own HDF5 file-reader, if you know the file-format. For this you would need the "Streaming" commands of the DM script language and browse through the (binary?) source file to the apropriate file location. The starting point for reading on this in the F1 help documentation would be here:

open office crashes after some time giving garbled font in converted PDF

We are converting word to pdf using the openoffice(3.4.1 version) in java with JODConverter.
below is the code used.
OpenOfficeConnection connection =
new SocketOpenOfficeConnection(2100);
try
{
connection.connect();
DocumentConverter converter =
new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
connection.disconnect();
return "Sucess " + DestinationPath + DestinationFileName;
}
catch (Exception localException1) {
}
The problem is that after random no of days the converted PDF contains the garbled fonts.
like # # ! $ $ " % &
The only solution we have so far is to restart the server. System guys are saying the the problem is with Open Office.
We are using open office to convert the document since it converts the doc files exactly including all the formatting and table structure.
what could be the solution to this.
So OpenOffice can be a little temperamental when running on a server, especially as it isn't multi-threaded and you end up having to run a pool of OpenOffice processes - see How can I use OpenOffice in server mode as a multithreaded service?.
Added to that often the rendering is off when converting to PDF - see https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=68865 which is why you may want to consider using a conversion service to automate the conversion tasks for you ?
For complete transparency I work for Zamzar (an online file conversion service), we have recently released a developer API - https://developers.zamzar.com/ that allows you to convert between a multitude of file types, specifically applicable to you here in that we support both doc and docx to pdf with little or not loss in the way the PDF is rendered. It maybe worth a look to see if this is a better alternative to trying to run your own solution through OpenOffice on a server.

QR Code possible data types or standards

I am developing an iOS Application for scanning QR Codes. I am successfully able to scan and get code from QR code.
Question:
My question is what are possible data types and format I can expect from QR Codes?
During my search on google I found QR Code can be used for
Contact data
Calendar data
URL
Email address
Phone number
SMS
Plain text
Geo location
Is this the complete list and is there same standard to represent above data in QR Codes? Means same way of generating QR Code for above QR types.
Is there any standard way of generating and representing data in QR Code?
Basically your text information has to be identifiable for what it is:
There is a very good summary here.
Contact data - use MeCard, or vCard (much more verbose), e.g.: MECARD:Surname, First;ADR:123 Some St., Town, Zip Code, Country;EMAIL:some_name#some_ip.com;TEL:+11800123123;BDAY:19550231;;
Gives:
Calendar data - There are two formats about iCalendar (.ics) & vCalendar (.vcs). These formats can also include location, alarm, to-do items, etc. Note that these are both verbose formats and you may be better off using a short URL to an online file in the file format but the person scanning needs to have internet connectivity and be willing to trust the QR code not to be doing anything bad.
URL: Start your url with the standard format specifier such as http://, e.g.: http://stackoverflow.com/questions/19900835/qr-code-possible-data-types-or-standards
Gives:
Email address - Start with mailto:SomeOne#SomeWhere.org gives:
Phone number - Start with tel: e.g. tel:+1-212-555-1212 gives:
SMS - See the RFC 5724.
Plain text - Just include the text.
Geo location - Use the geo:lat,long,alt format URI: geo:40.71872,-73.98905,100 (100 feet above Googles offices) gives:
WIFI - (ssid is 'abc' and password is '1234'). For WEP encryption: WIFI:S:abc;T:WEP;P:1234;;. For WPA/WPA2: WIFI:S:abc;T:WPA;P:1234;;. Without encryption: WIFI:S:abc;T:nopass;P:1234;;.
All the above example were generated with the Python qrcode package from the command line.
Basically, QR Code returns text data that can be of any type. You can put any type of data in any string format in QR Code. It totally depends on you.
You can consider it as
[NSString stringWithFormat].
Github - Zxing (Barcode Contents) has a summary.
There may or may not be a standard.
If you are looking for non-standard formats,
please update your documentation and contribute to open source.

Load or Stress Testing Tool with URL Import Functionality

Can someone recommend a load testing tool which allows you to either:
a. replay an IIS (7) log(s) to simulate a real live site daily run;
b. import a CSV or equivalent list of URLS so we can achieve a similar thing as above but at a URL level;
c. .net API so I can create simple tests easily from my list of URLS is also a good way to go.
I do not really want to record my tests.
I think I can do B) with WAPT but need to create an XML file manually, not too much grief, but wondering if any tools cover these scenarios out the box.
Visual Studio Test Edition would require some code to parse the file into a suitable test run.
It is a great load testing solution.
Our load testing service lets you write a very simple script using JavaScript to pull data out of a CSV file and then fetch those URLs. For example, the following code would pluck 10 random URLs from the CSV file and fetch them as part of a single session:
var c = browserMob.openHttpClient();
var csv = browserMob.getCSV("urls.csv");
browserMob.beginTransaction();
for (var i = 0; i < 10; i++) {
browserMob.beginStep("Step 1");
var url = csv.random().get("url");
c.get(url);
browserMob.endStep();
}
browserMob.endTransaction();
The CSV file itself needs to be a normal CSV file with the first row containing a header named "url". This script would be run repeatedly for each virtual user participating in a load test.
We have support for so called 'uri-format' in our open-source tool called Yandex.Tank You simply put all your uris to a file, one uri -- one line, then specify headers in your load.ini like this:
[phantom]
address=example.org
rps_schedule=line(1, 1600, 2m)
headers = [Host: mts-maps.yandex.ru]
[Connection: close] [Bloody: yes]
ammo_file = ammo.uri
ammo.uri:
/
/index.html
/1/example.html
/2/example.html

Resources