I've read a few questions that answer this, and I understand the regular expression I'm required to use, however actually applying it in MVC is where I stumble. I will also preface by saying I am terrible at regular expressions so far.
I'm writing a file upload application in MVC and I want to apply standard windows filename validation. \/:*?"<>| are invalid characters anywhere in the name.
My View Model for this is setup like so, using a different regex I found:
public class FileRenameModel
{
[RegularExpression(#"^[\w\-. ]+$", ErrorMessage="A filename cannot contain \\ / : * ? \" < > |")]
[Required]
public string Filename { get; set; }
[Required]
public int FileID { get; set; }
}
Whenever I try to change the regex to #"^[\\/:?"<>|]+$ the " in the middle kills it and throws an error. I haven't figured out how to properly escape it so that I can include it in the string. When I use the regex without the " it tells me any string I put into the textbox fails. Am I using the ^ incorrectly?
Use double "" to escape quotes after starting a string with #.
To search for anything except you'd want to insert an additional ^ inside the brackets to create an except for match: #"^[^\\/:?""<>|]+$" Keep the ^ at the beginning as well to match the start of line.
Having said that, keep in mind for validation that browsers handle file names differently. Some older browsers sent a path along with the filename, that might break your validation for a legitimate file.
This regular expression should match a more-than-sufficient subset of valid NTFS filenames (bear in mind that an NTFS file name may contain pretty much any Unicode character.)
Regex rxValidFileName = new Regex(#"^[[\p{IsBasicLatin}\p{IsLatin-1Supplement}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]-[\p{C}<>:""/\|?*]]+$" , RegexOptions.IgnorePatternWhitespace|RegexOptions.IgnoreCase);
What this matches:
start-of-line, followed by
1 or more of
any basic latin, latin-1 supplement, or latin extended-A or -B character, unless...
unless it's a C0 or C1 control character or one of the characters otherwise disallowed by NTFS — <>:"/\|?*
terminated by end-of-line.
Note that this matches a file name, not a file path. A file path is more complicated, since it's got a grammar to it, something like this, in crude ABNF:
filepath : relative-path
| absolute-path
| drivespecifier (relative-path|absolute-path)?
| unc-share (absolute-path)?
;
relative-path : filename ( directory-separator filename? )*
absolute-path : directory-separator ( filename? directory-separator )*
directory-separator : [/\]
drivespecifier : [a-zA-Z] ":"
unc-share : "\\" filename "\" filename absolute-path?
Related
I'm trying to re-create Tijs' CurryOn16 example "TrafoFields" scraping the code from the video, but using the Java18.rsc grammar instead of his Java15.rsc. I've parsed the Example.java successfully in the repl, like he did in the video, yielding a var pt. I then try to do the transformation with trafoFields(pt). The response I get is:
|project://Rascal-Test/src/TrafoFields.rsc|(235,142,<12,9>,<16,11>): Syntax error: concrete syntax fragment
My TrafoFields.rsc looks like this:
module TrafoFields
import lang::java::\syntax::Java18;
/**
* - Make public fields private
* - add getters and setters
*/
start[CompilationUnit] trafoFields(start[CompilationUnit] cu) {
return innermost visit (cu) {
case (ClassBody)`{
' <ClassBodyDeclaration* cs1>
' public <Type t> <ID f>;
' <ClassBodyDeclaration* cs2>
'}`
=> (ClassBody)`{
' <ClassBodyDeclaration* cs1>
' private <Type t> <ID f>;
' public void <ID setter>(<Type t> x) {
' this.<ID f> = x;
' }
' public <Type t> <ID getter>() {
' return this.<ID f>;
' }
' <ClassBodyDeclaration* cs2>
'}`
when
ID setter := [ID]"set<f>",
ID getter := [ID]"get<f>"
}
}
The only deviation from Tijs' code is that I've changed ClassBodyDec* to ClassBodyDeclaration*, as the grammar has this as a non-terminal. Any hint what else could be wrong?
UPDATE
More non-terminal re-writing adapting to Java18 grammar:
Id => ID
Ah yes, that is the Achilles-heal of concrete syntax usability; parse errors.
Note that a generalized parser (such as GLL which Rascal uses), simulates "unlimited lookahead" and so a parse error may be reported a few characters or even a few lines after the actual cause (but never before!). So shortening the example (delta debugging) will help localize the cause.
My way-of-life in this is:
First replace all pattern holes by concrete Java snippets. I know Java, so I should be able to write a correct fragment that would have matched the holes.
If there is still a parse error, now you check the top-non-terminal. Is it the one you needed? also make sure there is no extra whitespace before the start and after the end of the fragment inside the backquotes. Still a parse error? Write a shorter fragment first for a sub-nonterminal first.
Parse error solved? this means one of the pattern holes was not syntactically correct. The type of the hole is leading here, it should be one of the non-terminals used the grammar literally, and of course at the right spot in the fragment. Add the holes back in one-by-one until you hit the error again. Then you know the cause and probably also the fix.
My program checks if an NSError object exists, and sends it to another method, like this:
if([response isEqualToString:#""]) {
[self handleError:commandError];
}
In handleError:, I try checking the localized description against an expected string like this:
-(void)handleError:(NSError*)error
{
NSString* errorDescription = [error localizedDescription];
NSLog(#"%#",errorDescription); //works fine
if([errorDescription isEqualToString:#"sudo: no tty present and no askpass program specified"]) {
NSLog(#"SO Warning: Attempted to execute sudo command");
}
}
However, the if statement isn't firing. The log outputs precisely the same thing I typed out in the if statement.
Unless you seriously think the If statement structure of iOS is broken, or the isEqualToString method implementation is broken, then the strings aren't the same and there is no mystery:
What you typed out is either using different characters (see: unicode and character encoding types) or there are invisible/nonprinting characters in your log output that you're not typing because you can't see them.
I'd suggest looping through the characters in your string and printing out the byte code values:
for (i=0 to length of string) : print [errorDescription characterAtIndex:i];
You'll find that the byte code sequence of the string you typed is not equal to the byte code sequence returned by localizedDescription method.
As others have said, basing program logic on exact character strings you don't control and which can change without notice is likely not an optimum solution here. What about error codes?
I would suggest using error codes, since you're using a library over which you have no control, usually the exposed interface should tell you what are the error codes associated to every type of expected errors. Using error code would make your code stronger, clear and string independent.
Anyway if you would prefer to continue comparing the strings values, because you have a good reason to do so, I'd suggest being aware of possible punctuation, formatting characters such as newlines for example, or lowercase / uppercase letters .
I'd like to run through a simple Rascal MPL parsing example, and am trying to follow Listing 1 from the Rascal Language Workbench (18531D.pdf) of May 3rd 2011. I've downloaded the current Rascal MPL version 0.5.1, and notice that a few module paths have changed. The following shows the content of my Entities.rsc:
module tut1::Entities
extend lang::std::Layout;
extend lang::std::Id;
extend Type;
start syntax Entities
= entities: Entity* entities;
syntax Entity
= #Foldable entity: "entity" Id name "{" Field* "}";
syntax Field
= field: Symbol Id name;
I'm assuming here that what was Name and Ident is now Id; and what was Type is now Symbol. I then continue as follows:
rascal>import tut1::Entities;
ok
rascal>import ParseTree;
ok
However, when I attempt to execute the crucial parse function, I receive the errors listed below. Where am I going wrong? (Despite the message I note that I can declare a Symbol variable at the Rascal prompt.)
rascal>parse(#Entities, "entity Person { string name integer age }");
Extending again?? ParseTree
Extending again?? Type
expanding parameterized symbols
generating stubs for regular
generating literals
establishing production set
generating item allocations
computing priority and associativity filter
printing the source code of the parser class
|prompt:///|(22,43,<1,22>,<1,65>): Java("Undeclared non-terminal: Symbol, in class: class org.rascalmpl.java.parser.object.$shell$")
org.rascalmpl.parser.gtd.SGTDBF.invokeExpects(SGTDBF.java:139)
org.rascalmpl.parser.gtd.SGTDBF.expandStack(SGTDBF.java:864)
org.rascalmpl.parser.gtd.SGTDBF.expand(SGTDBF.java:971)
org.rascalmpl.parser.gtd.SGTDBF.parse(SGTDBF.java:1032)
org.rascalmpl.parser.gtd.SGTDBF.parse(SGTDBF.java:1089)
org.rascalmpl.parser.gtd.SGTDBF.parse(SGTDBF.java:1082)
org.rascalmpl.interpreter.Evaluator.parseObject(Evaluator.java:493)
org.rascalmpl.interpreter.Evaluator.parseObject(Evaluator.java:544)
org.rascalmpl.library.Prelude.parse(Prelude.java:1644)
org.rascalmpl.library.Prelude.parse(Prelude.java:1637)
sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
somewhere in: $shell$
The example is out-of-date. Something like this would work better:
module tut1::Entities
extend lang::std::Layout; // for spaces and such
extend lang::std::Id; // for the Id non-terminal
start syntax Entities
= entities: Entity* entities;
syntax Entity
= #Foldable entity: "entity" Id name "{" Field* "}";
syntax Field
= field: Id symbol Id name; // now Id is used instead of Symbol and "symbol" is just the name of a slot in the rule
Some explanation:
the imports/extends have gone, they were unnecessary and might be confusing
there was a missing definition for the Symbol non-terminal. I don't know what it was supposed to do, but it should have been defined syntax Symbol = ..., but it did not make sense to me and instead I reused Id to define the type of a field.
the type checker (under development) would have warned you before using the parse function.
I have written parser_sub.mly and lexer_sub.mll which can parse a subroutine. A subroutine is a block of statement englobed by Sub and End Sub.
Actually, the raw file I would like to deal with contains a list of subroutines and some useless texts. Here is an example:
' a example file
Sub f1()
...
End Sub
haha
' hehe
Sub f2()
...
End Sub
So I need to write parser.mly and lexer.mll which can parse this file by ignoring all the comments (e.g. haha, ' hehe, etc.) and calling parser_sub.main, and returns a list of subroutines.
Could anyone tell me how to let the parser ignore all the useless sentences (sentences outside a Sub and End Sub)?
Here is a part of parser.mly I tried to write:
%{
open Syntax
%}
%start main
%type <Syntax.ev> main
%%
main:
subroutine_declaration* { $1 };
subroutine_declaration:
SUB name = subroutine_name LPAREN RPAREN EOS
body = procedure_body?
END SUB
{ { subroutine_name = name;
procedure_body_EOS_opt = body; } }
The rules and parsing for procedure_body are complex and are actually defined in parser_sub.mly and lexer_sub.mll, so how could I let parser.mly and lexer.mll do not repeat defining it, and just call parser_sub.main?
Maybe we can set some flag when we are inside subroutine:
sub_starts:
SUB { inside:=true };
sub_ends:
ENDSUB { inside:=false };
subroutine_declaration:
sub_starts name body sub_ends { ... }
And when this flag is not set you just skip any input?
If the stuff you want so skip can have any form (not necessarily valid tokens of your language), you pretty much have to solve this by hacking your lexer, as Kakadu suggests. This may be the easiest thing in any case.
If the filler (stuff to skip) consists of valid tokens, and you want to skip using a grammar rule, it seems to me the main problem is to define a nonterminal that matches any token other than END. This will be unpleasant to keep up to date, but seems possible.
Finally you have the problem that your end marker is two symbols, END SUB. You have to handle the case where you see END not followed by SUB. This is even trickier because SUB is your beginning marker also. Again, one way to simplify this would be to hack your lexer so that it treats END SUB as a single token. (Usually this is trickier than you'd expect, say if you want to allow comments between END and SUB.)
So I'm trying to parse a simple arithmetic dynamic expression using System.Linq.Dynamic.
This runs fine when executed in an English environment where the CurrentCulture is English-US (and the decimal separator is a plain "." dot).
Trying to run the code in a non English environment (e.g. Windows7x64 in Bulgarian, where the decimal separator is a "," comma), ParseLambda fails.
If I put "1.0" in my expression, ParseLambda fails in the Bulgarian environment with a PraseExpression, saying "Invalid real literal '1.0'" (but does not fail in the English environment).
If I try to put "1,0" in my expression, ParseLambda fails with a ParseExpression saying "Syntax error" (this one fails in both environments).
Anyone knows a way around this?
Or am I missing something?
Or can I somehow set the culture of the parsed expression?
I need my app to run well on both environments..
My code is running on .NET v4.0 and I have System.Linq.Dynamic.dll (1.0.0.0) added as reference to the project.
Here's the code:
using System;
using System.Linq;
using System.Linq.Dynamic;
namespace DynamicExpressionTest
{
class Program
{
static void Main(string[] args)
{
//FAIL: ParseException: Invalid real literal '1.0' (fails only in non-English environment)
var expression1 = DynamicExpression.ParseLambda(
new System.Linq.Expressions.ParameterExpression[] { },
typeof(double),
"1.0 + 1.0");
var result1 = expression1.Compile().DynamicInvoke();
double resultD1 = System.Convert.ToDouble(result1);
Console.WriteLine(resultD1);
//FAIL: ParseException: Syntax error (fails both in English and non-English environments)
var expression2 = DynamicExpression.ParseLambda(
new System.Linq.Expressions.ParameterExpression[] { },
typeof(double),
"1,0 + 1,0");
var result2 = expression2.Compile().DynamicInvoke();
double resultD2 = System.Convert.ToDouble(result2);
Console.WriteLine(resultD2);
}
}
}
Thanks!
You can set the current culture before running that code. E.g. add this line before your code that only works with an English-style decimal separator:
System.Threading.Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");