Is there a listing of known whois query output formats? - parsing

TL;DR: I need a source for as many different output formats from a whois query as possible.
Background:
I am looking for a single reference that can provide as many (if not all) unique whois query output formats as possible.
I don't believe this exists but hope to be proven wrong.
This appears to be an age old problem
This stackoverflow post from 2015 references the challenge of handling the "~40 formats" that the author was aware of.
The author never detailed any of these formats.
The RFC for whois is... depressing
The IETF ran an analysis in 2015 that examined the components of whois per each RIR at the time
In my own research I see that registrars like JPNIC do not appear to comply with the APNIC standards
I am aware of existing tools that do a bang-up job parsing whois (python-whois for example) however I'd like to hedge my bets against outliers with odd formats. I'm also open to possible approaches to gather this information, however that would likely be too broad to fit this question.
Hoping there is a simple "go here and download this" answer. Hoping...

"TL;DR: I need a source for as many different output formats from a whois query as possible."
There isn't, except if you use any kind of provider that does this for you, with whatever caveats.
Or more precisely there isn't something public, maintained and exhaustive. You can find various libraries that try to do this, in various languages, but none is complete, as this is basically an impossible task, especially if you want to include any TLDs, like ccTLDs (you are not framing your constraints space in a very detailed way, nor in fact really saying you are asking about domain name data in whois or IP addresses/ASN data?).
Some providers of course try to do that and offering you an abstract uniform API. But why would anyone share their internal secret sauce, that is list of parsers and so on? It makes no business incentive to do that.
As for opensource library authors (I was one at some point), it is just tedious and absolutely not rewarding at all to just update it forever with all new formats and tweaks per registry (battle scar example: one registrar in the past changed its output format at each query! one query gave you somefield: somevalue while next time it was somefield:somevalue or somefield somevalue, etc. of course that is only a simple example).
RFC 3912 specified just the transport part, not the content, hence a lot of cases appeared. Specifically in the ccTLD world, each registry is king in its kingdom and it is free to implement whatever it wants the way it wants. Also the protocol had some serious limitations (ex: internationalization, what is the "charset" used for the underlying data) that were circumvented in different ways (like passing "options" in your query... of course none of them are standardized in any way)
At the very least, gTLDs whois format is specified there:
https://www.icann.org/resources/pages/approved-with-specs-2013-09-17-en#whois
Note however that due to GDPR there were changes (see https://www.icann.org/resources/pages/gtld-registration-data-specs-en/#temp-spec) and will be other changes in the future.
However, you should be highly pressed to look at RDAP instead of whois.
RDAP is now a requirement in all gTLDs registries and registries. As it is JSON, it solves immediately the problem of format.
Its core specifications are:
RFC 7480 HTTP Usage in the Registration Data Access Protocol (RDAP)
RFC 7481 Security Services for the Registration Data Access Protocol (RDAP)
RFC 7482 Registration Data Access Protocol (RDAP) Query Format
RFC 7483 JSON Responses for the Registration Data Access Protocol (RDAP)
RFC 7484 Finding the Authoritative Registration Data (RDAP) Service
You can find various libraries doing RDAP for you (see below for links), but at its core it is JSON over HTTPS so you can emulate simple cases with any kind of HTTP client library.
Work is underway to fix some missing/not precise enough details on RFC 7482 and 7483.
You need also to take into account ICANN specifications (again, only for gTLDs of course):
https://www.icann.org/en/system/files/files/rdap-technical-implementation-guide-15feb19-en.pdf
https://www.icann.org/en/system/files/files/rdap-response-profile-15feb19-en.pdf
Note that, right now, even if it is an ICANN requirement, you will find a lot of missing or broken gTLD registries or registrar RDAP server. You will also find a lot of "deviations" in replies from what would be expected per the specification.
I gave full details in various other questions here, so maybe have a look:
https://stackoverflow.com/a/61877920/6368697
https://stackoverflow.com/a/48066735/6368697
https://webmasters.stackexchange.com/a/115605/75842
https://security.stackexchange.com/a/213854/137710
https://serverfault.com/a/999095/396475
PS: philosophical question on "Hoping there is a simple "go here and download this" answer. Hoping..." because a lot of people hoped for that in the past, and see initial remark at beginning. Let us imagine you go forward and build this magnificent resource with all exhaustive details. Would you be inclined to just share it with anyone, for free? The answer is probably no, for obvious reasons, so the same happened in the past for others that went on the same path as you, and hence the results of now various providers offering you more or less this service (you would need to find details on which formats are parsed, the rate limites, the prices, etc.), but nothing freely available to share.
Now you can just dream/hope that every registries and registrars switch to RDAP AND implement it properly. Then the problem of format is solved once for all. However, the above requirements ("every" + "properly") are not small, and may not happen "soon". Specifically in ccTLDs, where registries are in no way mandated by any external force (except market pressure?) to implement RDAP at all.

Related

How trustworthy are polls by pinpoll

I was recently looking at security issues from online-polls and the problem with online-elections and how they can sometimes very easily be tampered with.
Now it sprung to my eye that a lot of websites that I visit and even local newspapers in my area use "pinpoll" for online-polls.
So I wanted to know how trustworthy and secure these polls are?
Tobias here, Founder and CEO of Pinpoll.
I agree with #GreyFairer, let's not discuss this on SO (unless you want to know why fingerprinting libraries shouldn't be used to identify individual clients or how Pinpoll is applying ws to broadcast live updates across the globe).
Just send me an e-mail to privacy#pinpoll.com and I'm happy to explain to you in more detail what we do (and cannot do) to protect polls agains bots and fake votes.
And let me make one thing clear: we're one of the most trustworthy providers in Europe, especially when it comes to complying with the EU's strict data protection laws.
One example: You won't find a single request to a server other than our own (located in the EU) in our interactive elements.
And one last thing: What might be annoying to you (which is fully accepted), is interesting and entertaining to others. So let's agree to disagree when it comes to online polls in news portals ;)

Confusion about the 005 IRC numeric and general RFC

After rading through the most recent IRC RFC I've gotten a bit confused,
the RFC states, under section 5.1 that response 005 is used for a bounce message,
but whenever I connect to an IRC server, the 005 numeric response is used for ISUPPORT, as it's described here.
Am i wrong to assume that RFC2812 us the newest? Or is there some addendum I've missed on the change of 005 to RPL_ISUPPORT?
I also found this earlier SO question (it's from 2011, but that still newer than any documentation I can find) In which the 005 reply is referred to as "map", which is a complete third thing now.
To add to the confusion I found another 2011 SO question here, in which someone points out RFC2812 is not the one implemented and that RFC 1459 should be followed instead, however in Section 6: Replies the replies from 0-199 is missing, and I'm unable to find them anywhere in the document.
I hope that someone can help shed a bit of light on the IRC documentation nightmare for me.
RFC2812 and its companions only reflected usage on IRCNet at the time. They were actually more of a political statement following the "great split" between EFNet and IRCNet. Rather than reflect any sort of community consensus, they
sought to codify IRCNet's practices as the "standard", even as numerous other networks had adopted competing implementations around the "TS" (timestamp) protocol.
The only known implementation of RPL_BOUNCE was in IRCNet's IRCD - and following the widespread adoption of 005 to mean RPL_ISUPPORT, and it's growing necessity to be able to convey differences in implementation to clients, RPL_BOUNCE was moved to the 010 numeric, and IRCNet itself has adopted 005 as RPL_ISUPPORT.
RPL_BOUNCE was itself a reflection or IRCNet philosophy. Servers on IRCNet have historically been tightly restricted along geographic and nationalistic lines - for example, a sever based in France might only accept connections from France and neighboring countries. In the past, this was very strictly enforced, with servers expected to only serve the limited user bases with which they had applied to serve, and only a small number of "open" servers permitted at any given time for the benefit of users without a server in their area. The overall effect of this was that any given user would have only a few servers to which they were authorized to connect, based on their
internet provider and geographic location, and therefore the "bounce" numeric provided a way for the geographically restricted servers to advertise another server for the user to connect to.
RPL_ISUPPORT was submitted as an Internet Draft, but for reasons unknown, there was no follow through on moving it towards the RFC stage.
In many cases, the source code of the server (ircd) software itself, and occasionally, some text files included with it, are the only meaningful documentation of modern usage - especially for server-to-server protocols, which are now completely nonstandard and tied to specific implementations.
There are some groups attempting to harmonize the various client-server extensions, such as the IRCv3 working group, but really, RFC1459 is still the least common denominator.
Postel's admonition to be conservative in what you do, be liberal in what you accept from others is especially true with respect to IRC, as clients must contend with an ever growing and ever diverging array of implementations. Good luck.
Indeed, except RFC 1459 there is no global documentation.
Your best bet is to look how a certain IRCd uses it, and look if you can get away with your interpretation on other IRCds.
The problem is that after too many forks, splits, reimplementations, there is no central authority to define what is used in which way.
Really, use the implementations as reference. If you choose to implement RFC-compliant behavior, you usually get into some problems, e.g. eggdrop has RFC-compliant CTCP support, which allows users to circumvent "No CTCP" channel modes.

HL7 CLIA and Lab Name location

I have had a request by a client to pull in the Lab Name and CLIA information from several different vendors HL7 feeds. Problem is I am unsure what node I should really pull this information from.
I notice one vendor is using ZPS and it appears they have Lab Name and CLIA there. Although I see that others do not use the ZPS. Just curious what would be the appropriate node to pull these from?
I see the headers nodes look really abbreviated with some of my vendors. I need a perfectly readable name like, 'Johnson Hospital'. Any suggestions on the field you all would use to pull the CLIA and Lab Name?
Welcome to the wild world of HL7. This exact scenario is why interface engines are so prevalent and useful for message exchange in the healthcare industry.
Up until, I believe HL7 v2.5.1, there was no standardization around CLIA identifiers. Assuming you are receiving ORU^R01 message, you may want to look at the segment OBX and field 15, which may have producer or lab identifier. The only thing is that there is a very slim chance that they are using HL7 2.5.1 or are implementing the guidelines as intended. There are a lot of reasons for all of this, but the concept here is that you should be prepared to have to do some work here for each and every integration.
For the data, be prepared to exchange or ask for a technical specification from your trading partner. If that is not a possibility or if they do not have one, you should either ask for a sample export of representative messages from their system or if they maybe have a vendor reference. Since the data that you are looking for is not quite as established as something like an address there is a high likelihood that you will have to get this data from different segments and fields from each trading partner. The ZPS segment that you have in your example, is a good reference. Any segment that starts with Z is a custom segment and was created because the vendor or trading partner could not find a good, existing place to store that data, so they made a new segment to store that data themselves.
For the identifiers, what I would recommend is to create a translation or a mapping table for identifiers. So, if you receive JHOSP or JH123 you can translate/map that to 'Johnson Hospital'. Each EMR or hospital system will have their own way to represent different values and there is no guarantee that they will be consistent, so you must be prepared to handle that scenario.

What is a concise way of understanding RESTful and its implications?

**update: horray! so it is a journey of practice and understanding. ;) now i no longer feel so dumb.*
I have read up many articles on REST, and coded up several rails apps that makes use of RESTful resources. However, I never really felt like I fully understood what it is, and what is the difference between RESTful and not-restful. I also have a hard time explaining to people why/when they should use it.
If there is someone who have found a very clear explanation for REST and circumstances on when/why/where to use it, (and when not to) it would benefit the world if you could put it up, thanks! =)
REST is usually learned like this:
You hear about REST being using HTTP the way it was meant to be used, and from that you shun SOAP Web Services' envelopes, since most of what's needed by many SOAP standards are handled by HTTP in a simple, no-nonsense way. You also quickly learn that you need to use the right method for the right operation.
Later, perhaps years later, you hear that REST is more than that. REST is in fact also the concept of linking between resources. This often takes a while to grasp the full meaning of, but when you learn this, you start introducing hyperlinks into your responses so that clients can navigate your system without being coupled to how the server wants to name its resources (i.e. the URIs).
Even later, you learn that you still haven't understood REST! And this is because you find out that media types are important. You start making media types called application/vnd.example.foo+json and put hyperlinks in them, since that's already your understanding of REST.
Years pass, and you re-read Fielding's thesis for the umpteenth time, to see if there's anything you missed, and it suddenly dawns upon you what really the HATEOAS constraint is: It's about the client not having any notion of how the server's resources are structured, but that it discoveres these relationships at runtime. It also means that the screen in front of the user is driven completely by what is passed over the wire, so in fact, if a server passes an image/jpeg then that's what you're supposed to show to the user, not an error message saying "AtomProcessor can't handle image/jpeg".
I'm just coming to terms with #4 and I'm hoping the ladder isn't much longer! It's taken me seven years.
This article does a good job classifying the differences in several http application styles from WS-* to RESTian purity. What I like about this post is it reminds you that most of what we call REST really is something only partly in line with Roy Fielding's original definition.
InfoQ has a whole section addressing more of the "what is REST" angle as well.
In terms of REST vs. SOAP, this question seems to have a number of good responses, particularly the selected answer.
I would imagine YMMV, but I found it very easy to start understanding the details of REST after I realised how REST essentially was a continuation of the static WWW concepts into the web application design space. I had written (a rather longish) post on the same : Why REST?
Scalability is an obvious benefit of REST (stateless, caching).
But also - and this is probably the main benefit of hypertext - REST is ideal for when you have lots of clients to your service. Following REST and the hypertext constraint drastically reduces the coupling between all those clients and your server, which means you have more freedom when evolving/developing your service over time - you are not tied down by the risk of breaking would-be-coupled clients.
On a practical note, if you're working with rails - then restfulie is a great little framework for tackling hypertext on the client and server. Server side is a rails extension, and client is a DSL for handling state changes. Interesting stuff, check it out here: http://restfulie.caelum.com.br/ - I highly recommend the tutorial/demo vids they have up on vimeo :)
Content-Type: text/x-flamebait
I've been asking the same question lately, and my supposition is that
half the problem with explaining why full-on REST is a good thing when
defining an interface for machine-consumed data is that much of the
time it isn't. OK, you'd need a really good reason to ignore the
commonsense bits (URLs define resources, HTTP verbs define actions,
etc etc) - I'm in no way suggesting we go back to the abomination that
was SOAP. But doing HATEOAS in a way that is both Fielding-approved
(no non-standard media types) and machine-friendly seems to offer
diminishing returns: it's all very well using a standard media type to
describe the valid transitions (if such a media type exists) but where
the application is at all complicated your consumer's agent still
needs to know which are the right transitions to make to achieve the
desired goal (a ticket purchase, or whatever), and it can't do that
unless your consumer (a human) tells it. And if he's required to
build into his program the out-of-band knowledge that the path with
linkrels create_order => add_line => add_payment_info => confirm is
the correct one, and reset_order is not the right path, then I don't
see that it's so much more grievous a sin to make him teach his XML
parser what to do with application/x-vnd.yourname.order.
I mean, obviously yes it's less work all round if there's a suitable
standard format with libraries and whatnot that can be reused, but in
the (probably more common) case that there isn't, your options
according to Fielding-REST are (a) create a standard, or (b) to
augment the client by downloading code to it. If you're merely
looking to get the job done and not to change the world, option (c)
"just make something up" probably looks quite tempting and I for one wouldn't
blame you for taking it.

Why is EDI still used, and how to deal with it?

Why is this archaic format still used in the face of easier-to-use technologies? Does it provide some benefit that I'm not seeing? It seems that a large amount of vendors still provide data only in this format, instead of something more manageable and easier to use such as XML; at the least it would make sense to me to offer both formats.
Also, what are some good ways to deal with and utilize EDI when you have no other choice but to use it? Something like BizTalk is out of the question as it's far too expensive. Are there any free/open source applications that make EDI easier to work with?
EDI is not that hard to understand once you familiarize yourself with the delimiters it uses. You might ask yourself as well why anyone would still be using CSV or tab-delimited data.
The answer is probably that those formats are "domain specific languages" defined by committee and standardized in a certain industry, and that a lot of money has already been invested in supporting those formats. Where's the business case to throw that all out again?
One word, Inertia. Developing the EDI formats by committee between various companys and organisations with different agendas was a nightmare (sad to say I have been there).
Asking them to abandon these with yet another round of committees agreeing web service API standards is going to take even longer, how do you sell the idea of replacing one electronic format with another to a non-technical board? What possible busness advantage does it give them. Originally the benefits of electronic exchange were clear but replace one with another is not. We're talking really big companies here.
You may be interested in the following project:
http://bots.sourceforge.net/en/index.shtml
Google code archive
A little information for all interested. EDI is basically a design by committee data exchange format that not only set out rules for data formatting (like XML), but also set out to define each document that could possibly ever be sent between 2 companies. So for any piece of data that could be exchanged between companies they came up with an exact definition of what was supposed to be in each of these documents. Of course, nobody could foresee every piece of data that 2 companies would want to exchange. So you end up with companies using fields that were defined for 1 thing, being used for some other piece of information.
What you ended up with, is an extremely convoluted data format, in which many people using it don't follow the standards, because they need to send custom information, which the standard doesn't account for. So in the end, you still need to talk to each company you want to deal with, and find out all the little idiosyncrasies of their implementation, just as you would have to do if you went to someone with a custom XML interface. Except that in the case of EDI, the format is hard to parse and even harder to write well, so you end up doing a whole bunch of work just to send a document, when doing the same kind of think with having a custom XML solution would have resulted in many times less problems.
And switching to XML would give you what - a slightly easier to debug line format?
Generally you set it up and leave it, there isn't a lot of need to play with the raw EDI feed, certainly not enough to abandon the standard and start again.
There are lots of standards, like FAX that could be made more readable but no real pressing need to change them.
Because it's a formally established Standard (in fact a very large and comprehensive set of standards). And that's one of the claimed benefits of a standard - you won't need to change anything for a long time.
And to change it, it takes agreement between two or more (often thousands and thousands more) trading partners (including maybe all of your competitors) to agree.
EDI formats have much higher signal-to-noise ratios (because they were designed back when that was considered important.) Someone who knows and understands EDI will look at your XML and say "Where's the beef (data)?"
Very few developers write their own parsers. There are many good mappers available (and many legacy and enterprise apps come with them built in). So there's lots of relief available for your pain (including at least one Open Source app on SourceForge).
"If it ain't broke, don't fix it."
Most of these organisations are processing vast amounts of data using EDI, and aren't about to change to something more modern without a compelling reason. And making things easy for third-party developers doesn't usually qualify, sad to say.
IMHO there are several problems with EDIFACT.
It is not easy to parse or generate an Object model from it. This is probably not a big problem anymore as there are now good system around that do it for you e.g. smooks.org
It is not easy to read. You get used to but XML is a lot easier to read
Validation isn't that easy (compare that to validating XML)
There are far too many different versions and flavours, D95B, D96B, D00A, D00B etc.
But I think the biggest problem is that everyone is using the standards differently. They use the same 'format' but the fields are defined differently. We use EDIFACT to send and receive messages from Container Terminals and they all have slight differences. They would e.g. all use a D95B CODECO but for some terminals a certain segment is mandatory while for another it is optional or even not allowed to be there. Then you have segments that are used the same but the content in it is different.
So to summarise it: It is a pain in the neck.
EDI is a very compact format and is often used to keep bandwidth usage in data exchanges as small as possible. The German customs offices for example use it in their ATLAS system to exchange a very high volume of data every day.
It is hard to parse and hard to read, but if the size of the resulting data matters, it can be a good choice and is supported by most of the bigger business applications.
Legacy Support
EDI is prolific in many industries. It would be prohibitively expensive to replace an already-working technology with a newer one.
Consider this, Walmart uses EDI to communicate with its vendors, stores, distribution chain, etc. I'm guessing they deal with tenss of thousands of vendors. Every one of them has sunk thousands of dollars into EDI technology. If Walmart decided to switch over to XML, its a decision that affects thousands of companies, not just Walmart.
This is true for any EDI user. After all, it's a standard used between trading partners.
I agree, EDI is a pain to work with. But 'back in the day', that's all we had.
Edifact is one of the best standards when it comes down to document interchange.
Most problems come from tradingpartners sending non standardized documents.
Yes it's a bit odd format and is tedious to work with if you don't know the ins and outs but that goes for XML as well.
You really want XML over Edifact? Look at the bloated, hard to read XML standards peppol (pan-european public procurement online) is working on.
Yes it's working nice and dandy if you don't have any errors in the systems, troubleshooting edifacts is so much easier once you get used to the format than troubleshooting UBL documents.
You say you have $0.00 to use on the project?
You really should look into the amount of manual work done in your company and the costsavings EDI can offer some cost benefit analysis can be mighty handy.
What types of information can be exchanged via EDI?
A variety of types of business information exchange is available via EDI including:
-•Booking information
-•Bill of Lading information
-•Invoicing
-•Electronic Funds Transfer
-•Arrival Notice Information
-•Shipment Status Information
How would choosing EDI benefit my company?
-•It streamlines the communications process between you and APL
-•It eliminates the need to rekey data, thus eliminating errors and the need to recheck information
-•It eliminates paper handling and the need for document storage
-•It improves the turntime and the accuracy of your data
-•It eliminates the need for faxing
One solution, although it will cost you, is to go to a company like ADX, which has tools you can use to convert EDI formats to more pleasing formats like CSV. Depending on the volume and type of transactions you are doing, this can be both affordable and a lot less stressful. I've used their products in the past, and while they are a bit of work to set up, they do work quite well, and are very stable. Because of the history of EDI, you could probably find hundreds of other companies that offer similar services.
EDI has been around since before XML. Apart from the fact that two parties can pre-negotiate the EDI format that works for them both you must also consider the part of the VAN (value added network.)
In some cases the VAN performs validation of the message, or even reads the message and performs actions on it, such as copying it to additional parties based on its content.
The only reason really to use EDI is because "that's the way it's always been done", and therefore there is a lot of existing infrastructure around to support it. Why switch to XML when there is no need? And how is to say XML wont be replaced by JSON which will then be replaced by something else?
Another reason is that being business messages such as order. invoices, credit notes etc there is a lot of financial worth in the transactions and they need to be secure but perhaps more importantly they need have end to end validation and verification as well as non repudiation.
For example i send you an order for 1/2 million Euros worth of goods, you send me the goods, then i "lose" the order information and tell you i am not paying. The combination of the standards and the VANS make this almost impossible or at least with so much of an audit trail that it the problems could be tracked. This is why the "Oh let use xml and the internet instead of EDIFACT and the VANS" tend to fail. As someone els answered, Inertia, but it is an inertia founded in a stable effective, secure, reliable and well understood system.
Doing it on the cheap is not always an option.
If it is any consolation when i first implemented EDI in '87 there was virtually no software around and so i got the Interbridge tables and wrote my own parser for the UK TRADACOMS standard using Cognos software on and HP Mini, and it worked fine. Assuming you are trading with other EDI partners the cost probably comes at the point of needing to use a VAN.
I've used EDI (ANSI X12 and EDIFACT) in 2 projects about Maritime Transport Logistics and found them to be very useful since most Ocean Carriers and Trading Partners accept them as the standard way of communication between their different systems.
So EDI format is still used and will continue to be used since it's a stablished standard and thousand companies have developed systems around them, and replacing them is a really big deal.
I've had to use EDI as well and I agree. We used BizTalk to map it which worked well. Many system are built on EDI(well before XML).

Resources