What languages have strong string parsing capability like Perl's? [closed] - parsing

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am familiar with Perl's strong parsing abilities using regular expressions.
Is it efficient?
What other languages have strong parsing ability and perform efficiently?

You can have a look at this benchmark which shows how different programming languages perform with regards to memory consumption and speed.

SNOBOL and Icon are two other languages devoted to manipulate strings. The first one is rather old while the second is not used much.
Anyway, I would start from your problem. Depending what are you trying to achieve (and you constraints) you might discover that even AWK, sed or gema would be a perfect match for your needs. Or not ...
I would dare to say that if parsing is so prominent in your task, you might benefit from using a parser generator (lex/yacc, ANTLR, lemon, ...).

Pretty much all modern languages have regular expressions that are relatively efficient: Java, C#, PHP, Python, even Javascript (amongst others).

I would say Python.
EDIT: I came across pystring, in case you're working in C++ but seek the flexibility of Python strings.

Powerbasic is well worth checking out. They have two versions. The Console Compiler would be ideal if you do not need GUI.
It is not on the Benchmark link above but it is extremely fast. I use it extensively for writing utilities to do specialized tasks.

Most languages these days have fast regexp libraries that you can use for your purposes. Perl's strength is that these are integrated into the language itself so you can do a lot of string processing with just the language core (as opposed to say, Python where it's a separate module).

Related

Which language has best community for Data mining and Machine learning? Python, Java, c++ or any other? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have the skills with C++, ruby,rails, and some other scripting languages. I want to start experimenting and learn concepts in Data mining and machine learning.
I want to be well equipped with the programming knowledge required for those concepts.
Which language has best support for DM and ML? puthon, Java, C++? Is there anything coming up in Javascript?
Thanks
Use whatever you are most comfortable with.
At least if it has the basics around. JavaScript and PHP are not very good in this domain. They just don't scale well for numerical computations. Python, R, Java and Scala are most popular. Theres Matlab, but there is not much happening there anymore. There is Julia which has a similar syntax but which is much more alive and shows some promise, if it weren't for colum-major,1-indexed arrays and the matlab-like syntax. Some use Lua, others Mathematica...
There are many many factors that play a role.
For example, scripting languages like Python and R are really slow - but these two also interface very well to C libraries (and Fortran!) so if you nostly use them as a "driver" and the libraries do all the work then they can be very usable. Just make sure to not assume every module is fast...
I think that perhaps your question is a bit off target. The languages themselves don't generally have the support: it's packages that interface with those languages, such as Apache Spark (interface to their ML package), Intel's MKL (vector and matrix operations optimized for Xeon Phi), SciKit (Python interface), etc.
That said, I see the most active support for languages that drive at distributed processing. In my ambit, Java/Spark is currently the front-runner, but one or two major releases can change the market considerably -- see the buzz on Tensor, for instance, or the staying power of BeautifulSoup.
For experimentation, start with your comfort zone. There are plenty of good tools that interface well with Ruby and C++, as well. As long as you're using this to learn the underlying concepts, I believe that you'll do best with a language you already know: that gives you one less area of frustration in your learning curve.
Anony-Mousse and Patricio have given you very good points with which I totally concur. I'm working in Python and Scala, with Java and Spark just underneath.
Python has a very strong support from the Data Science Community, you have very good packages like Pandas and Python has a very good integration with Spark

Will there be a dynamic code injection for dart? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am currently preparing a talk about Polymer.dart and would like to give a short introduction to dart. There is one question I would like to be prepared for:
Will there be ever a dynamic code injection via <script> for dart?
This article says that there is currently no support for this for a good reason.
However, the currently relativizes the statement a bit and I wonder if there is anything planned in the future to support dynamic code injection?
If for example the "eval" command is introduced in Dart, then the answer is YES, Dart is vulnerable to injection attacks.
Javascript is in this regard like SQL: it has the same vulnerability than all other dynamically interpreted programming languages (this includes all shell scripts, PHP...), which I call "DATA IS CODE". Such languages have a concrete syntax which is meant for human consumption and their processing entails a first step which is called PARSING: the sequence of characters is broken down into an internal structure which describes the meaning of the expression, in a way which the computer can distinguish the DATA from the INSTRUCTIONS. It is the same problem that lead to the introduction of the NX (No-eXecute) bit on modern CPUs. Functions like "eval" open the door to malicious code to be executed with no constraint. Parsing code at runtime should NEVER be allowed in a secure language.
This is why Dart doesn't recomend the use of injections, as explained here:
https://www.dartlang.org/articles/embedding-in-html/#no-script-injection-of-dart-code
"No script injection of Dart code We do not currently support or
recommend dynamically injecting a tag that loads Dart code.
Recent browser security trends, like Content Security Policy, actively
prevent this practice."
But google should do more than that, and forbid it entirely, together with the "eval" command.
It is better to direct such questions to your crystal ball ;-)
Google is very reluctant to make statements about such things.
There were discussions in the past and they considered it and they might reconsider it eventually.
Currently the only option is to launch new isolates and even this is still work in progress and has still limitations that makes this feature hard to use (no access to the browser API for client isolates for example).
I'm not sure this question can really be answered; as it's probably not been decided.
Based on what's written in that page; I think it's very unlikely (especially as other rules, like one script tag, and a single main entry point).
But as with everything, things can change!

Parsing math equations [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Just for kicks, I'm trying to create an application that can simplify, factor, and expand algebra equations. Programming the rules seems as if it will be straight forward if I can get the equations into a good workable format. Parsing the equations is proving to be a hassle, Currently working with Python but I'm not against having to learn something new.
Are there any libraries (for any language) that would make this project pretty simple, or is that a pipe dream?
[Tagging this with Haskell because I have a feeling that's where the 'simple' is]
Yes, Haskell has many many libraries that make writing parsers reasonably easy. Parsec is a good start, and it even has clones in other languages, including Python (that article also links to pyparsing which looks like it might also work).
This answer of mine is an example (note, it's probably not top-notch Parsec or Haskell): it's indicative of the power of Haskell's parsing libraries, precisely 4 lines of code implement the whole parser.
You could also browse old questions and answers to get a feel for the various libraries and techniques, e.g. parsec, parsing+haskell and parsing+python.
The best way to work out your line of attack for the larger project would be to start small and just try stuff until you're comfortable with your tools: choose a library and try to implement a relatively simple parser, like parsing expressions with just numbers, + and *, or even just parsing numbers and + with bracketing... something small (but not too small; those two examples each have non-trivialities, the first has operator precedence and the second has recursive nesting). If you don't like the library much, try a different library.
It's been done in just about every language.
Python has a library for parsing algebraic equations and symbolic mathematics all ready to go:
http://code.google.com/p/sympy/
I'd recommend reusing, unless your purpose is to learn how to write such a thing.
Python or matlab would be my suggestions. Are you planning on storing the whole equation in a string, and then split it up, to factor and simplify?
Give some more information, kindof a cool project.
This is an old question, but I'd like to suggest you MathParseKit.
This is a C++ library that given a string like "2*3/4" gives you a Tree of functions/variable/constants that defines the expression.
You can solve it, but you can even change it and put it again in string format.
You can find it at:
https://github.com/B3rn475/MathParseKit

A good F# codebase to learn from [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I've been teaching myself F# for a while now. I've read Programming F# by Chris Smith (great book) and I've written a few small scripts for getting the job done here and there.
But IMO the best way to learn a new programming language—and more importantly, the idioms that come with it—is to read a good open source codebase written in that language. Naturally, writing code in that language is crucial, but in the beginning, you're basically struggling with your own ignorance about how things should be done. You could perform certain tasks one way or the other, but it takes experience to realize the flaws and virtues of each. Even after you've gotten a firm grasp of how things work, reading the code of people who have an even firmer one helps a great deal.
Most would agree that the most insightful parts of any learn-a-programming-language book are the code examples, and reading a well-written open source codebase is the next level of that.
So are there any out there for F#?
Ref this question.
IMO, F# PowerPack is the best code base there.
Here are a few additional links that you may find interesting:
If you download F# for Visual Studio 2008, it also comes with sources of the entire F# library. This is sometimes a bit difficult code and it uses some internal tricks in a few places, but it is sometimes very good resource for learning (for example, Map type is a great example of a tree data structure).
There are some official F# Samples and some F# Community Samples (which includes my 3D fractal, example of working with quotations and a few shorter examples).
You can also look at the source code of samples from my Real-World Functional Programming book. Especially later chapters contain relatively complex sample applications (parallel simulations of animas, rectangle drawing application, etc.) The code has quite a lot of comments, so this may be useful for learning F#.
I would say that the WPF F# control codebase at http://wpffsharp.codeplex.com/ is a good place to start. One of the least trivial aspects of F# is UI and this is an excellent start to UI in F#. Also, since the code base is written by someone also learning F#, you have the benefit of seeing the iterations they go through.

lexers / parsers for (un) structured text documents [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
There are lots of parsers and lexers for scripts (i.e. structured computer languages). But I'm looking for one which can break a (almost) non-structured text document into larger sections e.g. chapters, paragraphs, etc.
It's relatively easy for a person to identify them: where the Table of Contents, acknowledgements, or where the main body starts and it is possible to build rule based systems to identify some of these (such as paragraphs).
I don't expect it to be perfect, but does any one know of such a broad 'block based' lexer / parser? Or could you point me in the direction of literature which may help?
Many lightweight markup languages like markdown (which incidentally SO uses), reStructured text and (arguably) POD are similar to what you're talking about. They have minimal syntax and break input down into parseable syntactic pieces. You might be able to get some information by reading about their implementations.
Define the annotation standard, which indicates how you would like to break things up.
Go on to Amazon Mechanical Turk and ask people to label 10K documents using your annotation standard.
Train a CRF (which is like an HMM, but better) on this training data.
If you actually want to go this route, I can elaborate on the details. But this will be a lot of work.
Most of the lex/yacc kind of programs work with a well defined grammar. if you can define your grammar in terms of a BNF like format (which most of the parsers accept similar syntax) then you can use any of them. That may be stating the obvious. However you can still be a little fuzzy around the 'blocks' (tokens) of text which would be part of your grammar. After all you define the rules for your tokens.
I have used Parse-RecDescent Perl module in the past with varying levels of success for similar projects.
Sorry, it may not be a good answer but more sharing my experiences on similar projects.
try: pygments, geshi, or prettify
They can handle just about anything you throw at them and are very forgiving of errors in your grammar as well as your documents.
References:
gitorius uses prettify,
github uses pygments,
rosettacode uses geshi,

Resources