Config file format - parsing

does anyone knows a file format for configuration files easy to read by humans? I want to have something like tag = value where value may be:
String
Number(int or float)
Boolean(true/false)
Array(of String values, Number values, Boolean values)
Another structure(it will be more clear what I mean in the fallowing example)
Now I use something like this:
IntTag=1
FloatTag=1.1
StringTag="a string"
BoolTag=true
ArrayTag1=[1 2 3]
ArrayTag2=[1.1 2.1 3.1]
ArrayTag3=["str1" "str2" "str3"]
StructTag=
{
NestedTag1=1
NestedTag2="str1"
}
and so on.
Parsing is easy but for large files I find it hard to read/edit in text editors. I don't like xml for the same reason, it's hard to read. INI does not support nesting and I want to be able to nest tags. I also don't want a complicated format because I will use limited kind of values as I mentioned above.
Thanks for any help.

What about YAML ? It's easy to parse, nicely structured has wide programming language support. If you don't need the full feature set, you could also use JSON.

Try YAML - is (subjectively) easy to read, allows nesting, and is relatively simple to parse.

Related

What language is this Salesforce code that I need to wrap?

I'm working on a Salesforce coding issue. Let me preface this by saying I'm not a developer or Salesforce expert.
What language is this?
Data Type FormulaThis formula references multiple objects
IF (Fulfillment_Submission_Form_URL__c <> "" && CONTAINS(Fulfillment_Submission_Form_URL__c, "qualtrics"),
Fulfillment_Submission_Form_URL__c &
(IF (CONTAINS(Fulfillment_Submission_Form_URL__c,"?SID="), "&", "?")) &
(IF (CONTAINS(TEXT(Type__c), "Site Visit"),
"ContactId="&Statement_of_Work__r.Contractor_Contact__c&
"&CoachType="&SUBSTITUTE(Statement_of_Work__r.Work_Type__r.Name," ","%20")&
"&CoachName="&SUBSTITUTE(Statement_of_Work__r.Contractor_Name__c," ","%20")&
"&InitPartId="&Initiative_Participation__r.Id&
"&InstitutionName="&substitute(substitute(SUBSTITUTE(Institution_Name__c," ","%20"),")",""),"(","")&
"&AccountId="&Initiative_Participation__r.Participating_Institution__r.Id&
"&TodaysDate="&TEXT(TODAY())&
"&SOWLineItemId="&Id&
"&LeaderCollege="&Initiative_Participation__r.ATD_Leader_College_Status__c&
"&SVRCompleted="&TEXT(Count_of_Site_Visit_Fulfillments__c)&
"&SVRRequired="&TEXT(Number_of_Work_Units_Allocated__c),
IF (CONTAINS(TEXT(Type__c), "Feedback"),
"InitPartId="&Initiative_Participation__r.Id&
"&SOWLineItemId="&Id&
"&ReportYear="&Statement_of_Work__r.SOW_Year__c&
"&UserId="&Contractor_User_Id__c&
"&InstitutionName="&substitute(substitute(SUBSTITUTE(Institution_Name__c," ","%20"),")",""),"(",""),
"")
))
,"")
Essentially it's pulling a link from another product we've integrated it with. We then take the basic link and reformat it to add parameters.
The problem is when it pulls in some parameters (ex: CoachName) the Coach entered their name in strange formats like: John (Coach) Doe.
So when the script outputs a URL that includes parameters it breaks at the &CoachName=John%20(Coach)% portion of the URL. Any easy way to work around this by modifying the script? Unfortunately we DO need that (Coach) identifier because the system we push to grabs that as well.
It's formula syntax, I'd compare it to Excel-like formulas. There's self-paced training if you don't want to read documentation. And as it's not exactly code-related you may have more luck on dedicated site, https://salesforce.stackexchange.com/. More admins lurk there.
So you do want that "(Coach)" to go through but it breaks the link? Looks like ( is a special character. It's not technically wrong to have unescaped parentheses, if it breaks that other site you might want to contact them and get their act together. RFC doesn't force us to encode them but looks like you'll have to to solve it at least in the short term: https://webmasters.stackexchange.com/questions/78110/is-it-bad-to-use-parentheses-in-a-url
Instead of poor man's encoding (SUBSTITUTE(Statement_of_Work__r.Contractor_Name__c," ","%20") try using proper URLENCODE(Statement_of_Work__r.Contractor_Name__c).
Or there's bit more "pro" function called URLFOR but the documentation doesn't make it very clear how powerful the 3rd parameter is with the braces [key1 = value1, key2 = value2] syntax. Basically just pass the parameters and let SF worry about encoding special characters etc.
Read my answer https://salesforce.stackexchange.com/a/46445/799 and there are some examples on the net like https://support.docusign.com/s/articles/DFS-URL-buttons-for-Lightning-basic-setup-limitations?language=en_US&rsc_301

How to get http tag text by id using lua

There is a webpage parser, which takes a page contains several tags, in a certain structure, where divs are badly nested. I need to extract a certain div element, and copy it and all its content to a new html file.
Since I am new to lua, I may need basic clarification for things might seem simple.
Thanks,
The ease of extraction of data is going to largely depend on the page itself. If the page uses the exact same tag information throughout its entirety, it'll be much more difficult to extract than it would if it has named tags.
If you're able to find a version of the page that returns json format, then you're that much better off. Here's a snippet of code on something I wrote to grab definitions from a webpage that did not have json format:
local actualword, definition = string.match(wayup,"<html.-<td class='word'>%c(.-)%c</td>.-<div class=\"definition\">(.-)</div>")
Essentially, this code searched down the page until it found the class "word", and took the word after it (%c is the pattern for control characters). It continued on to "definition" and captured that, as well.
As you can see, it's a bit convoluted, but I had the luck of having specifically named tags for what I wanted.
This is edited to fit your comment. As a side note that I should have mentioned before, if you're familiar with regular expressions, you can use its model to capture what you need. In this case, it's capturing the string in its totality:
local data = string.match(page, "(<div id=\"aa\"><div>.-</div>.-</div>)")
It's rarely the fault of the language, but rather the webpage itself, that makes it hard to data mine anything. Since webpages could literally have hundreds of lines of code, it's hard to pinpoint exactly what you want without coming across garbage information. It's why I prefer a simplified result such as json, since Lua has a json module that can encode/decode and you can get your precise information.

How to get the string that caused parse error?

Suppose I have this code:
(handler-case (read ...)
(parse-error (condition)
(format t "What text was I reading last to get this error? ~s~&"
(how-to-get-this-text? condition))))
I can only see the parse-namestring accessors, but it gives the message of the error, not the text it was parsing.
EDIT
In my case the problem is less generic, so an alternative solution not involving the entire string that failed to parse can be good too.
Imagine this example code I'm trying to parse:
prefix(perhaps (nested (symbolic)) expressions))suffix
In some cases I need to stop on "suffix" and in others, I need to continue, the suffix itself has no other meaning but just being an indicator of the action the parser should take next.
READ parses from a stream, not a string. The s-expression can be arbitrarily long. Should READ keep a string of what's been read?
What you might need is a special stream. In standard Common Lisp there is no mechanism for user defined streams. But in real life every implementation has such extensible streams. See for example 'gray streams'.
http://www.sbcl.org/1.0/manual/Gray-Streams.html
There's no standard function to do it. You might be able to brute-force something with read-from-string, but whatever you do, it will require some extra work.

how to obtain URLs from Dmoz ODP

I want to use a database of URLs present in DMOZ ODP for my application. ( an array of URL strings OR a file containing the same ). Is there any way of obtaining it , ( other than the manual copy-paste ) ?
EDIT :
Is there any script / code to parse the rdf file..
Take a look at http://rdf.dmoz.org/, you'll need to find a way to parse the RDF into your database.
I did this the other day using the odp2db scripts from Steve's Software. They're old, but the format hasn't changed significantly so they work fine.
I found I didn't need to do the iconv and xmlclean.pl steps suggested in the readme, just uncompressed the dumps and ran the structure2db.pl and content2db.pl scripts. You'll need to create the database tables manually (see the SQL at top of script for that) and modify the connection details in the scripts before you start.
With the mid-January 2009 dump I used, there's 756,962 categories and 4,436,796 websites. It took a while to run through them all, but not excessively long, though I did dispense with the site descriptions as I didn't need them. Also, may be worth adding database indices after creating the tables to speed access up later. The raw structure and content files were 75MB and 300MB compressed respectively. 848MB and 2GB respectively.
I've actually done this in java. I just used the SAX API to read through the RDF files. It was pretty straight forward. In my case I wanted to pull out every URL that was in a topic with "Weblogs" in the topic name.
Basically what did was implement a org.xml.sax.helpers.DefaultHandler
Then to setup the code you do:
InputSource is = new InputSource(new FileInputStream("filename.rdf"));
XMLReader r = XMLReaderFactory.createXMLReader();
r.setContentHandler(new MyHandlerClass());
r.parse(is);
and that's pretty much it. In my handler class I had to implement:
startElement(String uri, String localName, String qName, Attributes attributes) then I had an if statement to see if it was an "ExternalPage" tag, in which case I went to another state to look for "topic","Title" and "Description". I had another
characters(char[] ch, int start, int length) where I read in the topic, title, and description text depending on which one had been most recently sent to startElement
endElement(String uri, String localName, String qName) where I checked to see which element was ending, and if it ExternalPage, that meant the end of the current element.
The whole thing was 80-90 lines of code for the basic parsing. So pretty easy to write. It was able to chew through the multi-gigabyte files in... I don't remember maybe a minute or two? If you just want to query out some specific data, it might be easier just to write the code to do that in your handler, rather then trying to load it into a DB.
If you find a tool that works well, that's obviously better then writing your own code. But writing your own code isn't hard! RDF is just an XML format, and it's not nested or anything. A simple SAX parser is easily doable in a day or so.
You could always pay one of the currupt editors there and they will help you out :)

How to parse a .xfa file

Hoping that someone has some info on how to parse a xfa file. I can parse csv or xml files just fine, but an xfa one has come along and I'm not familar with the format. Looks like tab delimited body with column metadata at the top.
Anyone dealt with these before or can give me a steer on how to parse them?
I use vb.net but the language of any solution isn't too relevant.
Much appreciated.
Mmm, looks like nobody has a clue. The problem is that .xfa doesn't look like a "standard" extension: after all, anybody can create its own extension names, from .xyz to .something...
I looked around a bit, found, unsurprisingly (the 'x') an XML format with this extension, not much more.
Indicating where this kind of file come from, what kind of data it holds, might help. Or not.
You describe the file as being a simple TSV (tab separated values) with a header. It is quite trivial to parse, with a tokenizer or some regex, so I am not sure where you are stuck.
I think you might be talking about this: http://en.wikipedia.org/wiki/XFA_forms
This seemed to be a page that was designed to deal with that template: http://www.w3.org/1999/05/XFA/xfa-template-19990614
That information should be enough to get the ball rolling. If that fails then you can always analyse the file itself for patterns and go from there. I don't see it being too tricky.
Anyway, I hope that helps.
P.S. If you could provide a link to that .xfa we could probably give you more help.
The original post says the content looks like "tab delimited body with column metadata at the top". An XFA form doesn't look anything like that - XFA forms typically use a *.xdp extension and are XML.
Check out the Adobe page:
http://partners.adobe.com/public/developer/xml/index_arch.html
(Adobe XML Forms Architecture, currently 1400 pages)
Let LiveCycle/Acrobat parse it for you.

Resources