I'm creating an extension for String and I'm trying to decide what proper/expected/good behavior would be for a subscript operator. Currently, I have this:
// Will crash on 0 length strings
subscript(kIndex: Int) -> Character {
var index = kIndex
index = index < 0 ? 0 : index
index = index >= self.length ? self.length-1 : index
let i = self.startIndex.advancedBy(index)
return self.characters[i]
}
This causes all values outside the range of the string to be capped to the edge of the string. While this reduces crashing from passing a bad index to the subscript, it doesn't feel like the right thing to do. I am unable to throw an exception from a subscript and not checking the subscript causes a BAD_INSTRUCTION error if the index is out of bounds. The only other option I can think of is to return an optional, but that seems awkward. Weighing the options, what I have seems to be the most reasonable, but I don't think anybody using this would expect a bad index to return a valid result.
So, my question is: what is the "standard" expected behavior of the subscript operator and is returning a valid element from an invalid index acceptable/appropriate? Thanks.
If you're implementing a subscript on String, you might want to first think about why the standard library chooses not to.
When you call self.startIndex.advancedBy(index), you're effectively writing something like this:
var i = self.startIndex
while i < index { i = i.successor() }
This occurs because String.CharacterView.Index is not a random-access index type. See docs on advancedBy. String indices aren't random-access because each Character in a string may be any number of bytes in the string's underlying storage — you can't just get character n by jumping n * characterSize into the storage like you can with a C string.
So, if one were to use your subscript operator to iterate through the characters in a string:
for i in 0..<string.characters.count {
doSomethingWith(string[i])
}
... you'd have a loop that looks like it runs in linear time, because it looks just like an array iteration — each pass through the loop should take the same amount of time, because each one just increments i and uses a constant-time access to get string[i], right? Nope. The advancedBy call in first pass through the loop calls successor once, the next calls it twice, and so on... if your string has n characters, the last pass through the loop calls successor n times (even though that generates a result that was used in the previous pass through the loop when it called successor n-1 times). In other words, you've just made an O(n2) operation that looks like an O(n) operation, leaving a performance-cost bomb for whoever else uses your code.
This is the price of a fully Unicode-aware string library.
Anyhow, to answer your actual question — there are two schools of thought for subscripts and domain checking:
Have an optional return type: func subscript(index: Index) -> Element?
This makes sense when there's no sensible way for a client to check whether an index is valid without performing the same work as a lookup — e.g. for a dictionary, finding out if there's a value for a given key is the same as finding out what the value for a key is.
Require that the index be valid, and make a fatal error otherwise.
The usual case for this is situations where a client of your API can and should check for validity before accessing the subscript. This is what Swift arrays do, because arrays know their count and you don't need to look into an array to see if an index is valid.
The canonical test for this is precondition: e.g.
func subscript(index: Index) -> Element {
precondition(isValid(index), "index must be valid")
// ... do lookup ...
}
(Here, isValid is some operation specific to your class for validating an index — e.g. making sure it's > 0 and < count.)
In just about any use case, it's not idiomatic Swift to return a "real" value in the case of a bad index, nor is it appropriate to return a sentinel value — separating in-band values from sentinels is the reason Swift has Optionals.
Which of these is more appropriate for your use case is... well, since your use case is problematic to being with, it's sort of a wash. If you precondition that index < count, you still incur an O(n) cost just to check that (because a String has to examine its contents to figure out which sequences of bytes constitute each character before it knows how many characters it has). If you make your return type optional, and return nil after calling advancedBy or count, you've still incurred that O(n) cost.
Related
I'm trying to solve the following problem:
A string containing only lower-case letters can be encoded into NUM[encoded string] format. For example, aaa can be encoded into 3[a]. Given an encoded string, find its original string according to the following grammar.
S -> {E}
E -> NUM[S] | STR # NUM[S] means encoded, while STR means not.
NUM -> 1 | 2 | ... | 9
STR -> {LETTER}
LETTER -> a | b | ... | z
Note: in the above grammar {} represents "concatenate 0 or more times".
For example, given the encoded string 3[a2[c]], the result (original string) is accaccacc.
I think this can be parsed by recursive descent parsing, and there are two ways to implement it:
Method I: Let the parsing method to return the result string directly.
Method II: Use a global variable, and each parsing method can just append characters to it.
I'm wondering if the two methods share the same time complexity. Suppose the result string is of length t. Then for method II, I think its time complexity should be O(t) because we read and write every character in the result string exactly once. For method I, however, my intuition was that it could be slower because the same substring can be copied multiple times, depending on the depth of recursions. But I'm not able to figure out the exact time complexity to justify my intuition. Can anyone give a hint?
My first suggestion is that your parser should produce an abstract syntax tree rather than directly interpret the string, no matter whether you choose to write a recursive descent parser, a state-based parser or use a parser generator. This greatly enhances maintainability and allows you perform validation, analyses, and transformations much more easily.
Method I
If I understand you correctly, in Method I you have functions for each grammar construct that return an immutable string, which are then recursively repeated and concatenated. For example, for the top-level concatenation rule
S ::= E*
you would have an interpretation function that looks like this:
string interpretS(NodeS sNode) {
string result = "";
for (int i = 0; i < sNode.Expressions.Length; i++) {
result = result + interpretE(sNode.Expressions[i]);
}
return result;
}
... and similarly for the other rules. It is easy to see that the time complexity of Method I is O(n²) where n is the length of the output. (NB: It makes sense to measure the time complexity in terms of the output rather than the input, since the output length is exponential in the length of the input, and so any interpretation method must have time complexity at least exponential in the input, which is not very interesting.) For example, interpreting the input abcdef requires concatenating a and b, then concatenating the result with c, then concatenating that result with d etc., resulting in 1+2+3+4+5 steps. (See here for a more detailed discussion why repeated string concatenation with immutable strings has quadratic complexity.)
Method II
I interpret your description of Method II like this: instead of returning individual strings which have to be combined, you keep a reference to a mutable structure representing a string that supports appending. This could be a data structure like StringBuilder in Java or .NET, or just a dynamic-length list of characters. The important bit is that appending a string of length b to a string of length a can be done in O(b) (rather than O(a+b)).
Note that for this to work, you don't need a global variable! A cleaner solution would just pass the reference to the resulting structure through (this pattern is called accumulator parameter). So now we would have functions like these:
void interpretS2(NodeS sNode, StringBuilder accumulator) {
for (int i = 0; i < sNode.Expressions.Length; i++) {
interpretE2(sNode.Expressions[i], accumulator);
}
}
void interpretE2(NodeE eNode, StringBuilder accumulator) {
if (eNode is NodeNum numNode) {
for (int i = 0; i < numNode.Repetitions; i++) {
interpretS2(numNode.Expression, accumulator);
}
}
else if (eNode is NodeStr strNode) {
for (int i = 0; i < strNode.Letters.Length; i++) {
interpretLetter2(strNode.Letters[i], accumulator);
}
}
}
void interpretLetter2(NodeLetter letterNode, StringBuilder accumulator) {
accumulator.Append(letterNode.Letter);
}
...
As you stated correctly, here the time complexity is O(n), since at each step exactly one character of the output is appended to the accumulator, and no strings are ever copied (only at the very end, when the mutable structure is converted into the output string).
So, at least for this grammar, Method II is clearly preferable.
Update based on comment
Of course, my interpretation of Method I above is exceedingly naive. A more realistic implementation of the interpretS function would internally use a StringBuilder to concatenate the results from the subexpressions, resulting in linear complexity for the example given above, abcdef.
However, this wouldn't change the worst case complexity O(n²): consider the example
1[a1[b[1[c[1[d[1[e[1[f]]]]]]]]]]
Even the less naive version of Method I would first append f to e (1 step), then append ef to d (+ 2 steps), then append def to c (+ 3 steps) and so on, amounting to 1+2+3+4+5 steps in total.
The fundamental reason for the quadratic time complexity of Method I is that the results from the subexpressions are copied to create the new subresult to be returned.
Time Complexity is estimated by counting the number of elementary operations performed by an algorithm, supposing that each elementary operation takes a fixed amount of time to perform, see here. Of interest is, however, only how fast this number of operations increases, when the size of the input data set increases.
In your case, the size of the input data means the length of the string to be parsed.
I assume by your 1st method you mean that when a NUM is encountered, its argument is processed by the parser completely NUM times. In your example, when „3“ is read from the input string, „a2[c]“ is processed completely 3 times. Processing here means to transverse the syntax tree up to a leave, and append the leave value, here the „c“ to the output string.
I also assume by your 2nd method you mean that when a NUM is encountered, its argument is only evaluated once and all intermediate results are stored and re-used. In your example, when „3“ is read from the input string and stored, „a“ is read from the input string and stored, „2[c]“ is processed, i.e. „2“ is read from the input string and stored, and finally „c“ is processed. „c“ is due to the stored „2“ combined to „cc“, and due to the stored „a“ combined to „acc“. This is due to the stored „3“ combined then to „accaccacc“, and „accaccacc“ is output.
The question now is, what is the elementary operation that is relevant to the time complexity? My feeling is that in the 1st case, the stack operations during transversal of the syntax tree are important, while in the 2nd case, string copying operations are important.
Strictly speaking, one can thus not compare the time complexities of both algorithms.
If you are, however, interested in run times instead of time complexities, my guess is that the stack operations take more time than string copying, and that then method 2 is preferable.
I want to validate my inputs to function with some conditions and show as a compiler warning/error.
How it is possible?
For example:
func getPoints(start: Int, end: Int) {
}
I want to show compiler warning/error when someone tries to give input high for start than end.
getPoints(start: 3, end: 10) // No warnings
getPoints(start: 6, end: 2) // Compiler warning like: end value can not be less than start value
Actually this is for a framework purpose. I want to ensure that the parameters are not bad inputs.
Such a constraint can't be enforced at compile time. Take Range for example, which enforces that the lowerBound always compares as less or equal to the upperBound. That's just an assertion that runs at run-time, and crashes if it's not met.
I would suggest you just change your API design to use a Range<Int> or ClosedRange<Int> taking pairs of Ints to model ranges is a bad idea, for many reasons:
It doesn't communicate the semantics of a range. Two integers could be anything, but a range is something much more specific.
It doesn't have any of the useful methods, like contains(_:), or support for pattern matching via the ~= operator.
Its error prone, because when passing pairs around, you might make a copy/paste error leading you to accidentally use the same param twice.
It reads better: getPoint(3...10)
You can't generate a warning at compile time, since the arguments are not evaluated beyond checking for type conformance.
In your example, you have used constants, so it would, in theory, be possible to perform the check you want, but what if you passed a variable, or the result of another function? How much of your code would the compiler need to execute in order to perform the check?
You need to enforce your requirements at run time. For example, you could have your function throw if the parameters were incorrect:
enum MyErrors: Error {
case rangeError
}
func getPoints(start: Int, end: Int) throws {
guard start <= end else {
throw MyErrors.rangeError
}
...
}
Or you could have the function simply handle the problem:
func getPoints(start: Int, end: Int) {
let beginning = min(start,end)
let ending = max(start,end)
...
}
Also, I recommend Alexander's suggestion of using Range instead of Int; it is always a good idea to take advantage of Foundation types, but I will leave my answer as it shows some approaches for handling issues at runtime.
I have 2 signal producer like this
func textSignal() -> SignalProducer(String?,NoError)
and
func searchSignal(text:String) -> SignalProducer([User]?,NSError)
how to call searchSignal without nested function? since flatmap & attemptMap need the same error result like this case is NoError and NSError
There are 2 type differences that one must fix to be able to compose both functions.
The original signal can carry nils, and the function you're trying to flatMap it with doesn't accept nils. The type system is telling you that you need to choose a policy as to what to do in those cases. Some options:
Filter nils:
textSignal.filter { $0 != nil }.map { $0! }
Not recommended because you'll ignore those values, so if the user searches for "foo", and then the text field produces a nil string, the app would still show the search results for "foo".
Make the search function allow nils: this would be easy to do, but you're really just shifting the problem over to the other function, which would have to handle the nil values.
Treat nil strings as empty strings
textSignal.map { $0 ?? "" }
This is probably the simplest and the one that produces the most natural results.
The second difference is the error type. The original signal doesn't produce errors, but the second one can. Using the promoteErrors function we can turn the first function from NoError to NSError like this:
textSignal.promoteErrors(NSError)
This is safe to do with NoError signals because we know at compile time that they won't actually produce errors, and therefore no casting needs to happen to change it to NSError.
Are the following two examples equivalent?
Example 1:
let x = String::new();
let y = &x[..];
Example 2:
let x = String::new();
let y = &*x;
Is one more efficient than the other or are they basically the same?
In the case of String and Vec, they do the same thing. In general, however, they aren't quite equivalent.
First, you have to understand Deref. This trait is implemented in cases where a type is logically "wrapping" some lower-level, simpler value. For example, all of the "smart pointer" types (Box, Rc, Arc) implement Deref to give you access to their contents.
It is also implemented for String and Vec: String "derefs" to the simpler str, Vec<T> derefs to the simpler [T].
Writing *s is just manually invoking Deref::deref to turn s into its "simpler form". It is almost always written &*s, however: although the Deref::deref signature says it returns a borrowed pointer (&Target), the compiler inserts a second automatic deref. This is so that, for example, { let x = Box::new(42i32); *x } results in an i32 rather than a &i32.
So &*s is really just shorthand for Deref::deref(&s).
s[..] is syntactic sugar for s.index(RangeFull), implemented by the Index trait. This means to slice the "whole range" of the thing being indexed; for both String and Vec, this gives you a slice of the entire contents. Again, the result is technically a borrowed pointer, but Rust auto-derefs this one as well, so it's also almost always written &s[..].
So what's the difference? Hold that thought; let's talk about Deref chaining.
To take a specific example, because you can view a String as a str, it would be really helpful to have all the methods available on strs automatically available on Strings as well. Rather than inheritance, Rust does this by Deref chaining.
The way it works is that when you ask for a particular method on a value, Rust first looks at the methods defined for that specific type. Let's say it doesn't find the method you asked for; before giving up, Rust will check for a Deref implementation. If it finds one, it invokes it and then tries again.
This means that when you call s.chars() where s is a String, what's actually happening is that you're calling s.deref().chars(), because String doesn't have a method called chars, but str does (scroll up to see that String only gets this method because it implements Deref<Target=str>).
Getting back to the original question, the difference between &*s and &s[..] is in what happens when s is not just String or Vec<T>. Let's take a few examples:
s: String; &*s: &str, &s[..]: &str.
s: &String: &*s: &String, &s[..]: &str.
s: Box<String>: &*s: &String, &s[..]: &str.
s: Box<Rc<&String>>: &*s: &Rc<&String>, &s[..]: &str.
&*s only ever peels away one layer of indirection. &s[..] peels away all of them. This is because none of Box, Rc, &, etc. implement the Index trait, so Deref chaining causes the call to s.index(RangeFull) to chain through all those intermediate layers.
Which one should you use? Whichever you want. Use &*s (or &**s, or &***s) if you want to control exactly how many layers of indirection you want to strip off. Use &s[..] if you want to strip them all off and just get at the innermost representation of the value.
Or, you can do what I do and use &*s because it reads left-to-right, whereas &s[..] reads left-to-right-to-left-again and that annoys me. :)
Addendum
There's the related concept of Deref coercions.
There's also DerefMut and IndexMut which do all of the above, but for &mut instead of &.
They are completely the same for String and Vec.
The [..] syntax results in a call to Index<RangeFull>::index() and it's not just sugar for [0..collection.len()]. The latter would introduce the cost of bound checking. Gladly this is not the case in Rust so they both are equally fast.
Relevant code:
index of String
deref of String
index of Vec (just returns self which triggers the deref coercion thus executes exactly the same code as just deref)
deref of Vec
Should I return a optional value like:
func someFunc(#num: Int) -> Obj? {
if num < 0 {
return nil
}
...
}
Or just use assert:
func someFunc(#num: Int) -> Obj {
assert(num >= 0, "Number should greater or equal then zero")
...
}
Edit: Now the conditions are identical in two cases, the number should greater or equal then 0. Negative values are not permitted.
If you use assert and the caller passes an invalid argument it is a non-recoverable error/crash. The caller may not be aware of all the ways the assert may be caused, that is internal logic the caller is not supposed to know.
Really the only time assert is meaningful is to check the calling arguments on method entry and even in that case it must be made clear to the user exactly what is invalid and that can never be made more stringent for the life of the method.
Since this is about Swift returning an Optional seems to make the most sense and it will be clear to the caller that a possible error must be handled. Optionals are a major feature of Swift, use them.
Or always return a useful result the way atan() handles being called with ±0 and ±Inf.
It depends on what you want the precondition of the function to be: if it is an error call it with a negative value (or non-positive; your two examples are contradictory), then go with the assert and document this. Then it becomes part of the contract that the user of the function must check the value if it's uncertain. Or if it makes more sense to support these values and return nil (e.g., the function will typically be called with such values and the nil is not an issue for that typical use), do that instead… Without knowing the details it's impossible to tell which suits best, but my guess would be the former.