I know the lexical analyser tokenizes the input and stores it in a stream, or at least that is what I understood. Unfortunately nearly all articles I have read only talk about lexing simple expressions. What I am interested in is how to tokenize something like:
if (fooBar > 5) {
for (var i = 0; i < alot.length; i++) {
fooBar += 2 + i;
}
}
Please note that this is pseudo code.
Question: I would like to know how the data structure looks like for tokens created by the lexer? I really have no idea for the example i gave above where code is nested. Some example would be nice.
First of all, tokens are not necessarily stored. Some compilers do store the tokens in a table or other data structure, but for a simple compiler (if there is such a thing) it's sufficient in most cases that the lexer can return the type of the next token to be parsed and then in some cases the parser might ask the lexer for the actual text that the token is made up of.
If we use your sample code,
if (fooBar > 5) {
for (var i = 0; i < alot.length; i++) {
fooBar += 2 + i;
}
}
The type of the first token in this sample might be defined as TOK_IF corresponding to the "if" keyword. The next token might be TOK_LPAREN, then TOK_IDENT, then TOK_GREATER, then TOK_INT_LITERAL, and so on. What exactly the types should be is defined by you as the author of the lexer (or tokenizer) code. (Note that there are about a million different tools to help you avoid the somewhat tedious task of coming up with these details by hand.)
Except for TOK_IDENT and TOK_INT_LITERAL the tokens we've seen so far are defined entirely by their type. For these two, we would need to be able to ask the lexer for the underlying text so that we can evaluate the value of the token.
So a tiny excerpt of the parser dealing with an IF statement in pseudo-code might look something like:
...
switch(lexer.GetNextTokenType())
case TOK_IF:
{
// "if" statement
if (lexer.GetNextTokenType() != TOK_LPAREN)
throw SyntaxError('( expected');
ParseRelationalExpression(lexer);
if (lexer.GetNextTokenType() != TOK_RPAREN)
throw SyntaxError(') expected');
...
and so on.
If the compiler did choose to actually store the tokens for later reference, and some compilers do e.g. to allow for more efficient backtracking, one way would be to use a structure similar to the following
struct {
int TokenType;
char* TokenStart;
int TokenLength;
}
The container for these might be a linked list or std::vector (assuming C++).
What are the benefits and drawbacks of the ?: operator as opposed to the standard if-else statement. The obvious ones being:
Conditional ?: Operator
Shorter and more concise when dealing with direct value comparisons and assignments
Doesn't seem to be as flexible as the if/else construct
Standard If/Else
Can be applied to more situations (such as function calls)
Often are unnecessarily long
Readability seems to vary for each depending on the statement. For a little while after first being exposed to the ?: operator, it took me some time to digest exactly how it worked. Would you recommend using it wherever possible, or sticking to if/else given that I work with many non-programmers?
I would basically recommend using it only when the resulting statement is extremely short and represents a significant increase in conciseness over the if/else equivalent without sacrificing readability.
Good example:
int result = Check() ? 1 : 0;
Bad example:
int result = FirstCheck() ? 1 : SecondCheck() ? 1 : ThirdCheck() ? 1 : 0;
This is pretty much covered by the other answers, but "it's an expression" doesn't really explain why that is so useful...
In languages like C++ and C#, you can define local readonly fields (within a method body) using them. This is not possible with a conventional if/then statement because the value of a readonly field has to be assigned within that single statement:
readonly int speed = (shiftKeyDown) ? 10 : 1;
is not the same as:
readonly int speed;
if (shifKeyDown)
speed = 10; // error - can't assign to a readonly
else
speed = 1; // error
In a similar way you can embed a tertiary expression in other code. As well as making the source code more compact (and in some cases more readable as a result) it can also make the generated machine code more compact and efficient:
MoveCar((shiftKeyDown) ? 10 : 1);
...may generate less code than having to call the same method twice:
if (shiftKeyDown)
MoveCar(10);
else
MoveCar(1);
Of course, it's also a more convenient and concise form (less typing, less repetition, and can reduce the chance of errors if you have to duplicate chunks of code in an if/else). In clean "common pattern" cases like this:
object thing = (reference == null) ? null : reference.Thing;
... it is simply faster to read/parse/understand (once you're used to it) than the long-winded if/else equivalent, so it can help you to 'grok' code faster.
Of course, just because it is useful does not mean it is the best thing to use in every case. I'd advise only using it for short bits of code where the meaning is clear (or made more clear) by using ?: - if you use it in more complex code, or nest ternary operators within each other it can make code horribly difficult to read.
I usually choose a ternary operator when I'd have a lot of duplicate code otherwise.
if (a > 0)
answer = compute(a, b, c, d, e);
else
answer = compute(-a, b, c, d, e);
With a ternary operator, this could be accomplished with the following.
answer = compute(a > 0 ? a : -a, b, c, d, e);
I find it particularly helpful when doing web development if I want to set a variable to a value sent in the request if it is defined or to some default value if it is not.
A really cool usage is:
x = foo ? 1 :
bar ? 2 :
baz ? 3 :
4;
Sometimes it can make the assignment of a bool value easier to read at first glance:
// With
button.IsEnabled = someControl.HasError ? false : true;
// Without
button.IsEnabled = !someControl.HasError;
I'd recommend limiting the use of the ternary(?:) operator to simple single line assignment if/else logic. Something resembling this pattern:
if(<boolCondition>) {
<variable> = <value>;
}
else {
<variable> = <anotherValue>;
}
Could be easily converted to:
<variable> = <boolCondition> ? <value> : <anotherValue>;
I would avoid using the ternary operator in situations that require if/else if/else, nested if/else, or if/else branch logic that results in the evaluation of multiple lines. Applying the ternary operator in these situations would likely result in unreadable, confusing, and unmanageable code. Hope this helps.
The conditional operator is great for short conditions, like this:
varA = boolB ? valC : valD;
I use it occasionally because it takes less time to write something that way... unfortunately, this branching can sometimes be missed by another developer browsing over your code. Plus, code isn't usually that short, so I usually help readability by putting the ? and : on separate lines, like this:
doSomeStuffToSomething(shouldSomethingBeDone()
? getTheThingThatNeedsStuffDone()
: getTheOtherThingThatNeedsStuffDone());
However, the big advantage to using if/else blocks (and why I prefer them) is that it's easier to come in later and add some additional logic to the branch,
if (shouldSomethingBeDone()) {
doSomeStuffToSomething(getTheThingThatNeedsStuffDone());
doSomeAdditionalStuff();
} else {
doSomeStuffToSomething(getTheOtherThingThatNeedsStuffDone());
}
or add another condition:
if (shouldSomethingBeDone()) {
doSomeStuffToSomething(getTheThingThatNeedsStuffDone());
doSomeAdditionalStuff();
} else if (shouldThisOtherThingBeDone()){
doSomeStuffToSomething(getTheOtherThingThatNeedsStuffDone());
}
So, in the end, it's about convenience for you now (shorter to use :?) vs. convenience for you (and others) later. It's a judgment call... but like all other code-formatting issues, the only real rule is to be consistent, and be visually courteous to those who have to maintain (or grade!) your code.
(all code eye-compiled)
One thing to recognize when using the ternary operator that it is an expression not a statement.
In functional languages like scheme the distinction doesn't exists:
(if (> a b) a b)
Conditional ?: Operator
"Doesn't seem to be as flexible as the if/else construct"
In functional languages it is.
When programming in imperative languages I apply the ternary operator in situations where I typically would use expressions (assignment, conditional statements, etc).
While the above answers are valid, and I agree with readability being important, there are 2 further points to consider:
In C#6, you can have expression-bodied methods.
This makes it particularly concise to use the ternary:
string GetDrink(DayOfWeek day)
=> day == DayOfWeek.Friday
? "Beer" : "Tea";
Behaviour differs when it comes to implicit type conversion.
If you have types T1 and T2 that can both be implicitly converted to T, then the below does not work:
T GetT() => true ? new T1() : new T2();
(because the compiler tries to determine the type of the ternary expression, and there is no conversion between T1 and T2.)
On the other hand, the if/else version below does work:
T GetT()
{
if (true) return new T1();
return new T2();
}
because T1 is converted to T and so is T2
If I'm setting a value and I know it will always be one line of code to do so, I typically use the ternary (conditional) operator. If there's a chance my code and logic will change in the future, I use an if/else as it's more clear to other programmers.
Of further interest to you may be the ?? operator.
The advantage of the conditional operator is that it is an operator. In other words, it returns a value. Since if is a statement, it cannot return a value.
There is some performance benefit of using the the ? operator in eg. MS Visual C++, but this is a really a compiler specific thing. The compiler can actually optimize out the conditional branch in some cases.
The scenario I most find myself using it is for defaulting values and especially in returns
return someIndex < maxIndex ? someIndex : maxIndex;
Those are really the only places I find it nice, but for them I do.
Though if you're looking for a boolean this might sometimes look like an appropriate thing to do:
bool hey = whatever < whatever_else ? true : false;
Because it's so easy to read and understand, but that idea should always be tossed for the more obvious:
bool hey = (whatever < whatever_else);
If you need multiple branches on the same condition, use an if:
if (A == 6)
f(1, 2, 3);
else
f(4, 5, 6);
If you need multiple branches with different conditions, then if statement count would snowball, you'll want to use the ternary:
f( (A == 6)? 1: 4, (B == 6)? 2: 5, (C == 6)? 3: 6 );
Also, you can use the ternary operator in initialization.
const int i = (A == 6)? 1 : 4;
Doing that with if is very messy:
int i_temp;
if (A == 6)
i_temp = 1;
else
i_temp = 4;
const int i = i_temp;
You can't put the initialization inside the if/else, because it changes the scope. But references and const variables can only be bound at initialization.
The ternary operator can be included within an rvalue, whereas an if-then-else cannot; on the other hand, an if-then-else can execute loops and other statements, whereas the ternary operator can only execute (possibly void) rvalues.
On a related note, the && and || operators allow some execution patterns which are harder to implement with if-then-else. For example, if one has several functions to call and wishes to execute a piece of code if any of them fail, it can be done nicely using the && operator. Doing it without that operator will either require redundant code, a goto, or an extra flag variable.
With C# 7, you can use the new ref locals feature to simplify the conditional assignment of ref-compatible variables. So now, not only can you do:
int i = 0;
T b = default(T), c = default(T);
// initialization of C#7 'ref-local' variable using a conditional r-value⁽¹⁾
ref T a = ref (i == 0 ? ref b : ref c);
...but also the extremely wonderful:
// assignment of l-value⁽²⁾ conditioned by C#7 'ref-locals'
(i == 0 ? ref b : ref c) = a;
That line of code assigns the value of a to either b or c, depending on the value of i.
Notes
1. r-value is the right-hand side of an assignment, the value that gets assigned.
2. l-value is the left-hand side of an assignment, the variable that receives the assigned value.
After updating to Xcode 7, my iOS App stopped compiling (due to several "changes"), however I've come up with something I don't know how to make any simpler. I need to sort an Array depending on the three following conditions:
appDelegate.all_breeds!.sortInPlace { (a, b) -> Bool in
(a.breedNameES == nil) ||
(b.breedNameES == nil) ||
(a.breedNameES! < b.breedNameES!)
}
Basically I need to see if either the breed A in spanish is nil, OR if the breed B in spanish is nil, otherwise I need to compare both breeds, and sort them in alphabetical order.
This had worked without problems until I updated to xCode 7, now the error I get is
Expression was too complex to be solved in reasonable time;
consider breaking up the expression into distinct subexpressions.
The problem is I can't break it into subexpressions, it's as basic as I can think of.
Any ideas how to solve this issue?
Appearently part of the problem was that Breed was a struct in my AppDelegate, which forced the compiler to iterate through all the structs to find it. This made that the expression took more time to compile.
The error message
Expression was too complex to be solved in reasonable time
can have various reasons. One is that the compiler could not infer the
types of expressions automatically from the context. In that case it
helps to add explicit type annotations. In your case
appDelegate.all_breeds!.sortInPlace { (a : AppDelegate.Breed, b : AppDelegate.Breed) -> Bool in
(a.breedNameES == nil) ||
(b.breedNameES == nil) ||
(a.breedNameES! < b.breedNameES!)
}
Suppose I want to print all locations with a hard coded value, in a context were d is an M3 Declaration:
top-down visit(d) {
case \number(str numberValue) :
println("hardcode value <numberValue> encountered at <d#\src>");
}
The problem is that the location <d#\src> is too generic, it yields the entire declaration (like the whole procedure). The expression <numberValue#\src> seems more appropriate, but it is not allowed, probably because we are too low in the parse tree.
So, the question is how to get a parent E (the closest parent, like the expression itself) of <numberValue> such that <E#\src> is defined?
One option is to add an extra level in the top-down traversal:
top-down visit(d) {
case \expressionStatement(Expression stmt): {
case \number(str numberValue) :
println("hardcode value <numberValue> encountered at <stmt#\src>");
}
}
This works, but has some drawbacks:
It is ugly, in order to cover all cases we have to add many more variants;
It is very inefficient.
So, what is the proper way to get the location of a numberValue (and similar low-level constructs like stringValue's, etc)?
You should be able to do the following:
top-down visit(d) {
case n:\number(str numberValue) :
println("hardcode value <numberValue> encountered at <n#\src>");
}
Saying n:\number(str numberValue) will bind the name n to the \number node that the case matched. You can then use this in your message to get the location for n. Just make sure that n isn't already in scope. You should be able to create similar patterns for the other scenarios you mentioned as well.
The Apple documentation says
Every switch statement must be exhaustive. That is, every possible
value of the type being considered must be matched by one of the
switch cases.
So in new Xcode I have placed a code like this
println(UInt16.min); // Output : '0'
println(UInt16.max); // Output : '65535'
var quantity : UInt16 = 10;
switch quantity {
case 0...65535: //OR case UInt16.min...UInt16.max:
println();
default:
println();
}
Now if i remove the default section I get a compiler error:
Switch must be exhaustive
Do you want to add missing cases? Fix
So my question is for a case that I have mentioned as case 0...65535: have I not mentioned all the case values for an UInt16 ?? But still I am getting an error ?? Why am I getting this error, Did i miss something ??
Swift only truly verifies that a switch block is exhaustive when working with enum types. Even a switching on Bool requires a default block in addition to true and false:
var b = true
switch b {
case true: println("true")
case false: println("false")
}
// error: switch must be exhaustive, consider adding a default clause
With an enum, however, the compiler is happy to only look at the two cases:
enum MyBool {
case True
case False
}
var b = MyBool.True
switch b {
case .True: println("true")
case .False: println("false")
}
If you need to include a default block for the compiler's sake but don't have anything for it to do, the break keyword comes in handy:
var b = true
switch b {
case true: println("true")
case false: println("false")
default: break
}
Part of why you see that error because the compiler can't verify that switch is exhaustive without running code. The expression 0...65535 creates a ClosedInterval struct, and when the switch statement executes it has to ask that struct if the value quantity is in the interval. There's room for that to change at run time, so the compiler can't check it at compile time. (See the Halting Problem.)
More generally, the compiler can't detect an exhaustive switch for integer values — even if you add specific cases for every integer value (case 0: ... case 1: ... ... case 65535:), it doesn't know your switch is exhaustive. (Theoretically it could, though: consider filing a feature request about this if it's something you'd like to see.)
As it stands, there are two scenarios where Swift can detect completeness and allow you to omit the default clause: enums and value binding in tuples. #NateCook's answer covers enums — if you switch on an enum value and have a case in your switch for every case in the enum, you don't need a default. You also don't need a default label if you switch on a tuple and bind every possible combination of values, as seen in the Swift book:
switch anotherPoint {
case (let x, 0):
println("on the x-axis with an x value of \(x)")
case (0, let y):
println("on the y-axis with a y value of \(y)")
case let (x, y):
println("somewhere else at (\(x), \(y))")
}
You might generalize this rule as "if the type system knows about the possible values of your type, it can detect switch completeness", but the fact that there's a level on which the type system doesn't know the range of possible (e.g.) UInt32 values is sort of splitting hairs...
Swift 4.1. Either you need to specify all cases or Just include default block inside switch statement.
(As of Swift 4.2, and probably earlier): I have a helper function that converts a Bool? into the selectedSegmentIndex for a UISegmentedControl with 2 segments. If the value is nil then neither segment should be selected. My function uses a switch, which returns the appropriate segment index for true or false values, and uses this to explicitly test for the nil and satisfy the compiler's need for it to be exhaustive:
case nil: // only remaining possible value
fallthrough
default:
return UISegmentedControl.noSegment
Technically, the case nil: fallthrough isn't required because the default: will suffice, but this syntax may be useful if you want to explicitly test a value to make the code more self-documenting, or perhaps in another situation.
Check if your enum was initialised as an optional which could be either case or nil