I have one task... of course I am not expecting you people to give me ready-made solution, but some outline will be very much helpful. Please help, as Lua is a new language for me.
So the task is:
I have three xml files. All the xml files are storing the data about the same objects say equipment. Except the name of the equipment, the parameters, xmls storing are different.
Now I want to make a generic xml file, which is carrying all the data(all parameters) about the equipment.
Please note that, the name will be unique and thus it will act as a key parameter.
I want to achieve this task with Lua script.
Lua does not do xml "by default". It is a language thought to be "embedded" into other systems, so it could happen that the system you have it "embedded in" is able to parse the xml files and pass them on to Lua. If that's the case, translate the xmls to Lua tables on the host system, then give them to Lua, manipulate them, in Lua, and return the resulting Lua table, so that the host can transform it to xml.
Another option, if available, would be installing a binary library for parsing xml, such as luaxml. If you are able to install it in your system, you should be able to manipulate the xml files more or less easily directly from Lua. But this possibility depends on the system you have embedded Lua into; a lot of systems don't allow installation of additional libraries.
Related
Is there any way we can parse Schema or XSD file using saxon?, I need to display all possible XPath for given XSD.
I found a way in org.apache.xerces but wanted to implement logic in Saxon as it supports XSLT 3.0 (we want to use same lib for XSLT related functionality as well)
thanks in advance
Saxon-EE of course includes an XSD processor that parses schema documents. I think your question is not about the low-level process of parsing the documents, it is about the higher-level process of querying the schemas once they have been parsed.
Saxon-EE offers several ways to access the components of a compiled schema programmatically.
You can export the compiled schema as an SCM file in XML format. This format isn't well documented but its structure corresponds very closely to the schema component model defined in the W3C specifications.
You can access the compiled schema from XPath using extension functions such as saxon:schema() and saxon:schema - see http://www.saxonica.com/documentation/index.html#!functions/saxon/schema
You can also access the schema at the Java level: the methods are documented in the Javadoc, but they are really designed for internal use, rather than for the convenience of this kind of application.
Of course, getting access to the compiled schema doesn't by itself solve your problem of displaying all valid paths. Firstly, the set of all valid paths is in general infinite (because types can be recursive, and because of wildcards). Secondly, features such as substitution groups and types derived by extension make if challenging even when the result is finite. But in principle, the information is there: from an element name with a global declaration, you can find its type, and from its type you can find the set of valid child elements, and so on recursively.
I want to store and load diverse program data in a Delphi project. This data ranges from simple strings to more complex recurring configuration object data.
As we all know ini files provide a fast and easy way to store program data but are limited to key-value representations.
XML is often the weapon of choice when it comes to requirements like this but I want to know if there is an alternative to XML.
Recently I found superobject for Delphi which seems to be a lot easier to handle than XML. Is there anything to be said against using JSON for such "non web task"?
Are you aware of other options that support data storage and load in plain text (like ini, xml, json) in Delphi?
In fact it doesn't matter which storing format you choose (ini, xml, json, whatever). Build an abstract Configuration class that fits all your needs and after that think about the concrete class and the concrete storing format, and decide by how easy to implement and maybe human readability
In some cases you also want to have different configuration aspects (global, machine, user).
With your configuration class you can easily mix them together (use global if not user defined) and can also mix up storing formats (global-config from DB, machine-config from Registry, user-config from file).
Good old INI Files work great for me, in combination with the built in TIniFile and TMemIniFile classes in the IniFiles unit
Benefits of INI files;
Not binary.
Easier to move from machine to machine than Registry settings.
Easy to inspect and view.
Unlike XML, it's simple and human readable
INI files are easy to modify either by hand or by tool and are almost bulletproof, whereas it's easy to make a malformed JSON or XML that is completely unreadable, it's hard to do more than "damage one section" of an INI file. Simplicity wins.
Drawbacks:
Unlike XML and Registry it's more or less "two levels", sections and items.
TMemIniFile doesn't order the results in any controllable way. I often wish I could control the order of items in my ini files if they are generated by a human being, I would like the order to be preserved, and TMemIniFile does not preserve order, thus I find I do not love TMemIniFile as much as love plain old TIniFile.
I am new to Lua and want to ask whether it is possible to restrict lua syntax in config file? I know that config loading have to be performed in jail, but how we can cope with while 1 do end in config file we want to load? Is there a way to allow only strings, assignments and tables in config and if not, then what is the best way to check that lua file doesn't contain undesirable constructs? Is manual pre-parsing the only solution?
You seem to already know about "sandboxing" in Lua. So what's left is as you say malicious constructs like infinite loops. And to solve that you need to solve the Halting Problem. Which is not practical.
Instead of "manually" parsing and hoping you find all the malicious content (you won't), how about just running your Lua interpreter with a timer set so that the script will be interrupted if it takes longer than N seconds?
If you want to explicitly forbid certain constructs in Lua, you have to actually scan the file yourself. Note that there are valid uses for those constructs, even in config files, so you are restricting what the user can do.
It wouldn't be too hard to write a simple Lua lexer that ignores the contents of strings and comments, but errors on any of the Lua keywords other than return. Given proper sandboxing (ie: no functions are available to be called), that should be sufficient to weed out anything malicious.
Also, note that Lua 5.1 doesn't make it easy to keep the parser from parsing non-text data (ie: compiled Lua bytecode). 5.2 offers specific API support for forcing the loader to only recognize text and therefore reject bytecode.
I have some experience with Pragmatic-Programmer-type code generation: specifying a data structure in a platform-neutral format and writing templates for a code generator that consume these data structure files and produce code that pulls raw bytes into language-specific data structures, does scaling on the numeric data, prints out the data, etc. The nice pragmatic(TM) ideas are that (a) I can change data structures by modifying my specification file and regenerating the source (which is DRY and all that) and (b) I can add additional functions that can be generated for all of my structures just by modifying my templates.
What I had used was a Perl script called Jeeves which worked, but it's general purpose, and any functions I wanted to write to manipulate my data I was writing from the ground up.
Are there any frameworks that are well-suited for creating parsers for structured binary data? What I've read of Antlr suggests that that's overkill. My current target langauges of interest are C#, C++, and Java, if it matters.
Thanks as always.
Edit: I'll put a bounty on this question. If there are any areas that I should be looking it (keywords to search on) or other ways of attacking this problem that you've developed yourself, I'd love to hear about them.
Also you may look to a relatively new project Kaitai Struct, which provides a language for that purpose and also has a good IDE:
Kaitai.io
You might find ASN.1 interesting, as it provide an absract way to describe the data you might be processing. If you use ASN.1 to describe the data abstractly, you need a way to map that abstract data to concrete binary streams, for which ECN (Encoding Control Notation) is likely the right choice.
The New Jersey Machine Toolkit is actually focused on binary data streams corresponding to instruction sets, but I think that's a superset of just binary streams. It has very nice facilities for defining fields in terms of bit strings, and automatically generating accessors and generators of such. This might be particularly useful
if your binary data structures contain pointers to other parts of the data stream.
I have a question about parsing HTML pages, specificaly forums,
i want to parse a forum or thread containing certain post criterias, i havent defined the
algorithm yet, since i have only parsed structure text formats before,
A use case may be copy and paste each thread into the program by hand, or insert a URL like
http://www.forums.com/forum/showthread.php?t=46875&page=3 and let the program parse the pages
Given all this i would like to know:
Is it possible to parse a forum thread on a HTML page?
what would be the best/Fastest/easiest language for doing this?
If i prefer Java what tools/libraries do i need for this?
Any other thing i should consider?
1 / yes
2 / Use some compact language like python or ruby for prototyping.
For python there is a neat library for HTML/XML parsing called beautifulsoup
For ruby, you could try: nokogiri or hpricot
3 / A Java tool to consider: htmlparser
4 / If you are interested only in some particular text or some special classes, a regular expression might be sufficient. But as soon as you want to dig deeper into the structure of the content, you'll need some kind of model to hold your data, and hence a parser, which, in the best case, can cope with the occuring incosistencies of real world html.
You might want to look into some sort of html parsing library, rather than using regular expressions to do this. There are some really good html parsers for ruby and python, but a quick google shows there to be a number of parsers for java as well. The benefit of these libraries is that you don't have to handle every edge case with regular expressions/they handle malformed html (both of which can be impossible with regexes, depending on what you want to do) and they also give you a much way of dealing with the data (for example, beautiful soup lets you grab all elements which belong to a specific class or to use some other css selector to limit which page elements you want to deal with).
Personally, I would, at least for the beginning, start in ruby or python, as the libraries are known and there is a lot of info about using them for this purpose. Also, I find it easier to quickly prototype these types of things in ruby or python than in the jvm. You could even later bring that code onto the jvm with jruby or jython, if it becomes necessary.
yes
regular expressions, any flavor.
probably the ones w/regex
there are tools out there that will do this for you.