QR Code possible data types or standards - ios

I am developing an iOS Application for scanning QR Codes. I am successfully able to scan and get code from QR code.
Question:
My question is what are possible data types and format I can expect from QR Codes?
During my search on google I found QR Code can be used for
Contact data
Calendar data
URL
Email address
Phone number
SMS
Plain text
Geo location
Is this the complete list and is there same standard to represent above data in QR Codes? Means same way of generating QR Code for above QR types.
Is there any standard way of generating and representing data in QR Code?

Basically your text information has to be identifiable for what it is:
There is a very good summary here.
Contact data - use MeCard, or vCard (much more verbose), e.g.: MECARD:Surname, First;ADR:123 Some St., Town, Zip Code, Country;EMAIL:some_name#some_ip.com;TEL:+11800123123;BDAY:19550231;;
Gives:
Calendar data - There are two formats about iCalendar (.ics) & vCalendar (.vcs). These formats can also include location, alarm, to-do items, etc. Note that these are both verbose formats and you may be better off using a short URL to an online file in the file format but the person scanning needs to have internet connectivity and be willing to trust the QR code not to be doing anything bad.
URL: Start your url with the standard format specifier such as http://, e.g.: http://stackoverflow.com/questions/19900835/qr-code-possible-data-types-or-standards
Gives:
Email address - Start with mailto:SomeOne#SomeWhere.org gives:
Phone number - Start with tel: e.g. tel:+1-212-555-1212 gives:
SMS - See the RFC 5724.
Plain text - Just include the text.
Geo location - Use the geo:lat,long,alt format URI: geo:40.71872,-73.98905,100 (100 feet above Googles offices) gives:
WIFI - (ssid is 'abc' and password is '1234'). For WEP encryption: WIFI:S:abc;T:WEP;P:1234;;. For WPA/WPA2: WIFI:S:abc;T:WPA;P:1234;;. Without encryption: WIFI:S:abc;T:nopass;P:1234;;.
All the above example were generated with the Python qrcode package from the command line.

Basically, QR Code returns text data that can be of any type. You can put any type of data in any string format in QR Code. It totally depends on you.
You can consider it as
[NSString stringWithFormat].

Github - Zxing (Barcode Contents) has a summary.
There may or may not be a standard.
If you are looking for non-standard formats,
please update your documentation and contribute to open source.

Related

Slack slash command results in different encoding coming from iOS app or MacOS App

We have a Slash command integration and discovered that the text passed to the slash command is encoded differently if it comes from the (iOS) mobile app compared to the Desktop app.
For the command "/whereis #xsd" on the MacOS Desktop app, the text element in the body is encoded as: text=%3C%23C02MKG1LH%7Cxsd%3E
For the command "/whereis #xsd" on the iOS app the text element in the body is encoded as: text=%26lt%3B%23C02MKG1LH%7Cxsd%26gt%3B
The iOS app is incorrect.
Did anyone else experience this? Any solutions?
(I have posted this question to Slack, they confirmed the behavior a while back but no solution from them so far).
This is not a bug. Both are valid HTML encodings. You can verify this by decoding them on this website.
The difference is that the string from IOS also includes an encoding of HTML special characters (like <) but the desktop string does not. To address this your app has to first do a URL decoding of the input string and then decode special HTML chars.
The results are:
Desktop: <#C02MKG1LH|xsd>
IOS: <#C02MKG1LH|xsd>
Here is a sample code that will decode both strings correctly in PHP:
<?php
function decodeInputString($input)
{
return htmlspecialchars_decode(urldecode($input));
}
$desktop = "%3C%23C02MKG1LH%7Cxsd%3E";
$ios = "%26lt%3B%23C02MKG1LH%7Cxsd%26gt%3B";
$desktop_plain = decodeInputString($desktop);
$ios_plain = decodeInputString($ios);
var_dump($desktop_plain);
var_dump($ios_plain);

How do some sites download YouTube captions?

This is somewhat of a duplicate question of Does YouTube API forbid to download video captions if you are not it's owner?, Get YouTube captions and Does YouTube API forbid to download video captions if you are not it's owner?, which all basically say it's not possible unless to download captions via the YouTube API unless you are the owner or third-party contributions are not enabled; however, my question is how to sites like http://downsub.com/ or http://www.lilsubs.com/ have access to all captions?
In other words, when I access the YouTube API myself (even with youtubepartner and youtube.force-ssl scopes), I can only download the captions of some videos, but when I try the same videos that failed for me with 403: The permissions associated with the request are not sufficient to download the caption track. The request might not be properly authorized, or the video order might not have enabled third-party contributions for this caption. on these other sites, it works fine. I'm assuming they are using the YouTube API to access the captions, but what special sauce are they using? Some special partner key? An different API version? Are they just scraping from the videos themselves or something?
Send a GET request on:
http://video.google.com/timedtext?lang={LANG}&v={VIDEOID}
Example for your video in comment: http://video.google.com/timedtext?lang=ko&v=0db1_qWZjRA
Let's look at another example of yours, i.e. https://www.youtube.com/watch?v=7068mw-6lmI (and I agree about differentiation part in your comment).
There are multiple subtitles available for the video
English
Korean
Spanish
Korean (auto-generated) also called asr (automatic speech recognition)
These stand for the subtitle name parameter (i.e., name=English).
lang stands for the country code.
In your example: https://www.youtube.com/api/timedtext?lang=es-MX&v=7068mw-6lmI&name=Spanish
If subtitle track is available, it is possible to do translation form it, namely using tlang parameter.
https://www.youtube.com/api/timedtext?lang=en&v=7068mw-6lmI&name=English&tlang=lv
https://www.youtube.com/api/timedtext?lang=ko&v=7068mw-6lmI&name=Korean&tlang=lv
This would be my bid for what these sites are using, i.e. translation of the available subtitle track (confirm by trying to use a video without subtitle track as input for one of their sites).
As for asr signature seems to always be needed, but as long as one of the subtitle tracks are available, you could use that for translation. E.g. in your OP comment example:
https://www.youtube.com/api/timedtext?lang=en&v=vx6NCUyg1NE&tlang=lv
Looks like the last example is special with both of subtitle tracks being asr (checked with Chrome -> Inspect -> Network) therefore you need to omit the subtitle name parameter part. This difference unfortunately is not visible in YouTube video's settings wheel.
A 2022 answer:
Option 1: Send a curl request to the webpage: curl -L "https://youtu.be/YbJOTdZBX1g", search for timedtext in the result, and you would get a URL. replace \u0026 with & and you get the link for the subtitle.
Option 2: Use the yt-dlp package:
# For installing see: https://github.com/yt-dlp/yt-dlp#with-pip
from yt_dlp import YoutubeDL
ydl_opts = {
"skip_download": True,
"writesubtitles": True,
"subtitleslangs": ["all", "-live_chat"],
# Looks like formats available are vtt, ttml, srv3, srv2, srv1, json3
"subtitlesformat": "json3",
# You can skip the following option
"sleep_interval_subtitles": 1,
}
with YoutubeDL(ydl_opts) as ydl:
ydl.download(["YbJOTdZBX1g"])
There is this unofficial API used by Youtube :
https://www.youtube.com/api/timedtext?lang={LANG}&v={VIDEO_ID}
LANG here is ISO 639-1 2 letter country code. For your example it would be :
https://www.youtube.com/api/timedtext?lang=ko&v=0db1_qWZjRA
You can check it in network tab while toggling the closed caption button :
I have used youtube-transcript-api successfully to retrieve transcripts. The below is a demo to dump the transcript into HTML with links back to the timestamps in the video:
import sys
from youtube_transcript_api import YouTubeTranscriptApi
video_id = sys.argv[1]
# Retrieve the available transcripts
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
# Just use the first transcript, let it raise an exception if none exist.
transcript = next(iter(transcript_list))
print("<html><body>")
for line_map in transcript.fetch():
st_sec = int(line_map['start'] / 60)
st_msec = int(line_map['start'] - st_sec * 60)
tstmp = f"{st_sec}:{st_msec}"
link_to_tstmp = f"https://youtu.be/{video_id}?t={st_sec*60}"
tstmp_str = ("%2d:%-2d" % (st_sec, st_msec)).replace(" ", " ")
#print(f"{st_sec}:{st_msec} {line_map['text']}")
print("""%s %s<br/>""" % (link_to_tstmp, tstmp_str, line_map['text']))
print("</html></body>")
If there are multiple transcripts, the library provides API to search by language etc.
You can further tweak the logic to merge text so you only get one link every so many minutes. I got good results for a lecture by linking at every 1 min and format the lines into a HTML table.

Parsing NYC Transit/MTA historical GTFS data (not realtime)

I've been puzzling on this on and off for months and can't find a solution.
The MTA claims to provide historical data in form of daily dumps in GTFS format here:
[http://web.mta.info/developers/MTA-Subway-Time-historical-data.html][1]
See for yourself by downloading the example they provide, in this case Sep, 17th , 2014:
[https://datamine-history.s3.amazonaws.com/gtfs-2014-09-17-09-31][1]
My problem? The file is gobbledygook. It does not follow GTFS specifications, has no extension, and when I open it using a text editor it looks like 7800 lines of this:
n
^C1.0^X �枪�^Eʞ>`
^C1.0^R^K
^A1^R^F^P����^E^R^K
^A2^R^F^P����^E^R^K
^A3^R^F^P����^E^R^K
^A4^R^F^P����^E^R^K
^A5^R^F^P����^E^R^K
^A6^R^F^P����^E^R^K
^AS^R^F^P����^E^R[
^F000001^ZQ
6
^N050400_1..S02R^Z^H20140917*^A1�>^V
^P01 0824 242/SFY^P^A^X^C^R^W^R^F^Pɚ��^E"^D140Sʚ>^F
^AA^R^AA^RR
^F000002"H
6
Per the MTA site (appears untrue)
All data is formatted in GTFS-realtime
Any idea on the steps necessary to transform this mystery file into usable GTFS data? Is there some encoding I am missing? I have looked for 10+ and been unable to come up with a solution.
Also, not to be a stickler but I am NOT referring to the MTA's realtime data feed, which is correctly formatted and usable. I am specifically referring to the historical data dumps I reference above (have received many "solutions" referring only to realtime data feed)
The file you link to is in GTFS-realtime format, not GTFS, and the page you linked to does a very bad job of explaining which format their data is actually in (though it is mentioned in your quote).
GTFS is used to store schedule data, like routes and scheduled arrival times.
GTFS-realtime is generally used to transfer actual transit performance data in real-time, like vehicle locations and expected or actual arrival times. It is a protobuf, a specification for compiled binary data publicized by Google, which means you can't usefully read it in a text editor, but you instead have to load it programmatically using the Google protobuf tools. It can be used as a historical data format in the way MTA is here, by making daily dumps of the GTFS-rt feed publicly available. It's called GTFS-realtime because various data fields in the realtime like route_id, trip_id, and stop_id are designed to link to the published GTFS schedules.
I confirmed the validity of the data you linked to by decompiling it using the gtfs-realtime.proto specification and the Google protobuf tools for Python. It begins:
header {
gtfs_realtime_version: "1.0"
timestamp: 1410960621
}
entity {
id: "000001"
trip_update {
trip {
trip_id: "050400_1..S02R"
start_date: "20140917"
route_id: "1"
}
stop_time_update {
arrival {
time: 1410960713
}
stop_id: "140S"
}
}
}
...
and continues in that vein for a total of 55833 lines (in the default string output format).
EDIT: the Python script used to convert the protobuf into string representation is very simple:
import gtfs_realtime_pb2 as gtfs_rt
f = open('gtfs-rt.pb', 'rb')
raw_str = f.read()
msg = gtfs_rt.FeedMessage()
msg.ParseFromString(raw_str)
print msg
This requires gtfs-realtime.proto to have been compiled into gtfs_realtime_pb2.py using protoc (following the instructions in the Python protobuf documentation under "Compiling Your Protocol Buffers") and placed in the same directory as the Python script. Furthermore, the binary protobuf downloaded from the MTA needs to be named gtfs-rt.pb and located in the same directory as the Python script.

Read site html from a site in a different geo region

I am using python and Beautiful soup to read html pages. Unfortunately some sites redirect to my Geo region (AU) so I can't retrieve the target countries version i.e. (UK, US, FR, NZ...)
I have tried using a VPN service but this requires me to manually change the region so I can't automate the process. I have tried using the python quartz.Coregraphics library to click the options on screen but this is temperamental.
Is there a way I can achieve this programmatically?
I have manage to nut this one out myself. Best answered by example for reading a uk based site.
import urllib2
url = 'Some-uk-url'
req = urllib2.Request(url)
req.add_header('Accept-Language', 'en-gb')
req.add_header('X-Forwarded-For', [a uk proxy ipaddress here])
htmltext = urllib2.urlopen(req).read()

Indy IMAP4 does not display German symbols correctly

I am using TIdIMAP4 component to fill the string grid with the messages of my GMail mailbox.
var IMAPClient: TIdIMAP4;
Some messages have German umlauts. When I call IMAPClient.RetrieveAllHeaders(MyMsgList) the string grid is populated as expected (all umlauts are displayed) but there are no UIDs however (I guess that RetrieveAllHeaders just doesn't fetch UIDs).
When I call IMAPClient.UIDRetrieveAllEnvelopes(MyMsgList) all additional attributes of a Messages are there, but the headers are displayed in abracadabra (=?ISO-8859-1?Q?_Die_Br=FCcke_von_Arnheim?=) // Shall be 'Die Brücke von Arnheim'.
I've read many supportive posts but could not find the answer why IndyIMAP4 treats German symbols incorrectly.
Any ideas?
RetrieveAllHeaders() decodes the raw data it retrieves. UIDRetrieveAllEnvelopes() retrieves the raw data only, it does not decode. You can decode the raw headers manually by calling Indy's DecodeHeader() function in the IdCoderHeader unit.

Resources