I'm struggling to express my code in a way to please the borrow-checker.
I have a function create_task which creates a future of some database operations. There's a stream of values where each element needs to be inserted to a database within a transaction. The problem is sharing the transaction between multiple closures as it has also mutably borrowed the connection object.
#![feature(conservative_impl_trait)]
extern crate futures;
extern crate rusqlite;
use futures::prelude::*;
use futures::{future, stream};
use rusqlite::Connection;
fn main() {
let task = create_task();
task.wait().unwrap();
}
fn create_task() -> impl Future<Item = (), Error = ()> {
let mut conn = Connection::open("temp.db").unwrap();
conn.execute("CREATE TABLE IF NOT EXISTS temp (val INTEGER)", &[]).unwrap();
// tx takes a mut ref to conn!
let tx = conn.transaction().unwrap();
stream::iter_ok::<_, ()>(vec![1, 2, 3])
.for_each(|val| {
// tx borrowed here!
tx.execute("INSERT INTO temp (val) VALUES (?1)", &[&val]).unwrap();
future::ok(())
})
.map(|_| {
// tx moved/consumed here!
tx.commit().unwrap();
})
}
There are multiple issues with the code:
conn does not live long enough. It needs to be also moved to the closures. Perhaps as an Rc<Connection> because of two closures?
conn can't be simply shared as an Rc because of mutability requirements. Perhaps Rc<RefCell<Connection>> is a more suitable type?
the borrow-checker does not know that the borrow to tx ends after the first for_each closure, therefore it cannot be moved to the second map closure. Once again, moving it as Rc<Transaction> to both closures might be reasonable?
I've been fiddling around with those ideas and know that the desired lifetimes are possible and make sense, but have not been able to express my code to the compiler in a correct way.
I believe your first problem is that you haven't fully grasped how lazy futures are. You are creating a Connection inside of create_task, taking a reference to it, putting that reference into a stream/future, then attempting to return that future. None of the closures have even executed at this point.
You cannot return a reference to a value created in a function. Don't try to store the transaction and the connection in the same struct, either.
Instead, accept a reference to a Connection and return a Future containing that lifetime.
The next problem is that the compiler doesn't know how the closures will be called or in what order. Instead of trying to close over the transaction, let it "flow" from one to the other, letting the ownership system ensure that it's always in the right place.
#![feature(conservative_impl_trait)]
extern crate futures;
extern crate rusqlite;
use futures::prelude::*;
use futures::{future, stream};
use rusqlite::Connection;
fn main() {
let mut conn = Connection::open("temp.db").unwrap();
conn.execute("CREATE TABLE IF NOT EXISTS temp (val INTEGER)", &[]).unwrap();
let task = create_task(&mut conn);
task.wait().unwrap();
}
fn create_task<'a>(conn: &'a mut rusqlite::Connection) -> impl Future<Item = (), Error = ()> + 'a {
let tx = conn.transaction().unwrap();
stream::iter_ok::<_, ()>(vec![1, 2, 3])
.fold(tx, |tx, val| {
tx.execute("INSERT INTO temp (val) VALUES (?1)", &[&val]).unwrap();
future::ok(tx)
})
.map(move |tx| {
tx.commit().unwrap();
})
}
A giant word of warning: If execute isn't asynchronous, you really shouldn't be using it inside of a future like that. Any blocking operation will cause all of your futures to stall out. You probably should be running the synchronous workload on a separate thread / threadpool.
See also:
Is there any way to return a reference to a variable created in a function?
How to store a SqliteConnection and SqliteStatement objects in the same struct in Rust?
Why can't I store a value and a reference to that value in the same struct?
Related
This question was also asked on F# Slack, but since that's not visible to the wider community, and since I don't have a solution yet, I figured it made sense to ask it here.
Basically, String.Create works with Span to create a new string and takes a delegate to fill the span. This is generally faster than changing a string the normal way: create a char array, call AsSpan, and calling the proper String constructor with the result.
This works fine:
let create value =
String.Create(12, value, fun buf v->
for i = 0 to buf.Length do
buf.[i] <- v)
But the minute I try to use a function that's curried, or try to use a delegate, I get a type error. According to F#, the delegate is of type SpanAction<'T, 'TArg>, where 'T translates to Span<'T> in the callback, and 'TArg is any argument you can use to prevent a closure.
I've tried several variants, but all come down to this basic idea:
type SpanDelegate<'T, 'TArg> = delegate of Span<'T> * 'TArg -> unit
let callback =
new SpanDelegate<char, char>(fun buf v ->
for i = 0 to buf.Length do buf.[i] <- v)
let create (value: char) =
String.Create(12, value, callback) // type error on callback
Whether or not I use it with the original lambda, I can't seem to get it to work. Ideas are welcome :).
The reason why your code does not work is that you are creating a custom delegate SpanDelegate, but the code expects a pre-defined delegate SpanAction.
I'm not exactly sure I understand your issue correct (and so I may be missing something), but if you create the pre-defined SpanAction delegate, then everything seems to be working fine:
open System
open System.Buffers
let callback =
SpanAction<char, char>(fun buf v ->
for i = 0 to buf.Length do buf.[i] <- v)
let create (value: char) =
String.Create(12, value, callback)
Gave VisualRust another try and to see how far they got, I wrote a few lines of code. And as usual, the code causes me to write a question on stackoverflow...
See first, read my question later:
fn make_counter( state : &mut u32 ) -> Box<Fn()->u32>
{
Box::new(move || {let ret = *state; *state = *state + 1; ret })
}
fn test_make_counter() {
let mut cnt : u32 = 0;
{
let counter = make_counter( & mut cnt );
let x1 = counter();
let x2 = counter();
println!("x1 = {} x2 = {}",x1,x2);
}
}
fn alt_make_counter ( init : u32 ) -> Box<Fn()->u32> {
let mut state = init;
Box::new(move || {let ret = state; state = state + 1; ret })
}
fn test_alt_make_counter() {
let counter = alt_make_counter( 0u32 );
let x1 = counter();
let x2 = counter();
println!("x1 = {} x2 = {}",x1,x2);
}
fn main() {
test_make_counter();
test_alt_make_counter();
}
The difference between make_counter() and alt_make_counter() is, that in one case, the state is a pointer to a mutable u32 passed to the function and in the other case, it is a mutable u32 defined inside the function. As the test_make_counter() function shows clearly, there is no way, that the closure lives longer than the variable cnt. Even if I removed the block inside test_make_counter() they would still have the identical lifetime. With the block, the counter will die before cnt. And yet, Rust complains:
src\main.rs(4,2): error : captured variable state does not outlive the enclosing closure
src\main.rs(3,1): warning : note: captured variable is valid for the anonymous lifetime #1 defined on the block at 3:0
If you look at the alt_make_counter() function now, the lifetime of state should basically cause the same error message, right? If the code captures the state for the closure, it should not matter if the pointer is passed in or if the variable is bound inside the function, right? But obviously, those 2 cases are magically different.
Who can explain, why they are different (bug, feature, deep insight, ...?) and if there is a simple rule one can adopt which prevents wasting time over such issues now and then?
The difference is not in using a local variable vs. using a parameter. Parameters are perfectly ordinary locals. In fact, this version of alt_make_counter works1:
fn alt_make_counter (mut state: u32) -> Box<FnMut() -> u32> {
Box::new(move || {let ret = state; state = state + 1; ret })
}
The problem is that the closure in make_counter closes over a &mut u32 instead of u32. It doesn't have its own state, it uses an integer somewhere else as its scratch space. And thus it needs to worry about the lifetime of that location. The function signature needs to communicate that the closure can only work while it can still use the reference that was passed in. This can be expressed with a lifetime parameter:
fn make_counter<'a>(state: &'a mut u32) -> Box<FnMut() -> u32 + 'a> {
Box::new(move || {let ret = *state; *state = *state + 1; ret })
}
Note that 'a is also attached to the FnMut() -> u32 (though with a different syntax because it's a trait).
The simplest rule to avoid such trouble is to not use references when they cause problems. There is no good reason for this closure to borrow its state, so don't do it. I don't know whether you fall under this, but I've seen a bunch of people that were under the impression that &mut is the primary or only way to mutate something. That is wrong. You can just store it by value and then just mutate that directly by storing it, or the larger structure in which it is contained, in a local variable that is tagged as mut. A mutable reference is only useful if the results of the mutation needs to be shared with some other code and you can't just pass the new value to that code.
Of course, sometimes juggling references in complicated ways is necessary. Unfortunately there doesn't seem to be a quick and easy way to learn to deal with those confidently. It's a big pedagogic challenge, but so far it appears everyone just struggled for a while and then progressively had fewer problems as they get more experienced. No, there is no single simple rule that solves all lifetime woes.
1 The return type has to be FnMut in all cases. You just didn't get an error about that yet because your current error happens at an earlier stage in the compilation.
I have the following tweet stream class. It has the TweetReceived event which can be used with the other components of my system.
It seems to work ok but I have the feeling that it's more complicated than it should be.
Are there any tools out there to give me this functionality without having to implement the mbox/event mechanism by myself?
Also would you recommend to use asyncSeq instead of IObservable?
Thanks!
type TweetStream ( cfg:oauth.Config) =
let token = TwitterToken.Token (cfg.accessToken,
cfg.accessTokenSecret,
cfg.appKey,
cfg.appSecret)
let stream = new SimpleStream("https://stream.twitter.com/1.1/statuses/sample.json")
let event = new Event<_>()
let agent = MailboxProcessor.Start(fun (mbox) ->
let rec loop () =
async {
let! msg = mbox.Receive()
do event.Trigger(msg)
return! loop()
}
loop ())
member x.TweetReceived = event.Publish
member x.Start () =
Task.Factory.StartNew(fun () -> stream.StartStream(token, agent.Post))
|> ignore
member x.Stop = stream.StopStream
UPDATE:
Thanks Thomas for the quick (as always) answer to the second question.
My first question may be a little unclear so I refactored the code to make the class AgentEvent visible and I rephrase the first question: is there a way to implement the logic in AgentEvent easier? Is this logic implemenented already in some place?
I'm asking this because it feels like a common usage pattern.
type AgentEvent<'t>()=
let event = new Event<'t>()
let agent = MailboxProcessor.Start(fun (mbox) ->
let rec loop () =
async {
let! msg = mbox.Receive()
do event.Trigger(msg)
return! loop()
}
loop ())
member x.Event = event.Publish
member x.Post = agent.Post
type TweetStream ( cfg:oauth.Config) =
let token = TwitterToken.Token (cfg.accessToken,
cfg.accessTokenSecret,
cfg.appKey,
cfg.appSecret)
let stream = new SimpleStream("https://stream.twitter.com/1.1/statuses/sample.json")
let agentEvent = AgentEvent()
member x.TweetReceived = agentEvent.Event
member x.Start () =
Task.Factory.StartNew(fun () -> stream.StartStream(token, agentEvent.Post))
|> ignore
member x.Stop = stream.StopStream
I think that IObservable is the right abstraction for publishing the events. As for processing them, I would use either Reactive Extensions or F# Agents (MailboxProcessor), depending on what you want to do.
Note that F# automatically represents events as IObservable values (actually IEvent, but that inherits from observable), so you can use Reactive Extensions directly on TweetReceived.
What is the right representation?
The main point of asyncSeq is that it lets you control how quickly the data is generated - it is like async in that you have to start it to actually do the work and get a value - so this is useful if you can start some operation (e.g. download next few bytes) to get the next value
IObservable is useful when you have no control over the data source - when it just keeps producing values and you have no way to pause it - this seems more appropriate for tweets.
As for processing, I think that Reactive Extensions are nice when they already implement the operations you need. When you need to write some custom logic (that is not easily expressed in Rx), using Agent is a great way to write your own Rx-like functions.
I am very, very new to Rust and trying to implement some simple things to get the feel for the language. Right now, I'm stumbling over the best way to implement a class-like struct that involves casting a string to an int. I'm using a global-namespaced function and it feels wrong to my Ruby-addled brain.
What's the Rustic way of doing this?
use std::io;
struct Person {
name: ~str,
age: int
}
impl Person {
fn new(input_name: ~str) -> Person {
Person {
name: input_name,
age: get_int_from_input(~"Please enter a number for age.")
}
}
fn print_info(&self) {
println(fmt!("%s is %i years old.", self.name, self.age));
}
}
fn get_int_from_input(prompt_message: ~str) -> int {
println(prompt_message);
let my_input = io::stdin().read_line();
let my_val =
match from_str::<int>(my_input) {
Some(number_string) => number_string,
_ => fail!("got to put in a number.")
};
return my_val;
}
fn main() {
let first_person = Person::new(~"Ohai");
first_person.print_info();
}
This compiles and has the desired behaviour, but I am at a loss for what to do here--it's obvious I don't understand the best practices or how to implement them.
Edit: this is 0.8
Here is my version of the code, which I have made more idiomatic:
use std::io;
struct Person {
name: ~str,
age: int
}
impl Person {
fn print_info(&self) {
println!("{} is {} years old.", self.name, self.age);
}
}
fn get_int_from_input(prompt_message: &str) -> int {
println(prompt_message);
let my_input = io::stdin().read_line();
from_str::<int>(my_input).expect("got to put in a number.")
}
fn main() {
let first_person = Person {
name: ~"Ohai",
age: get_int_from_input("Please enter a number for age.")
};
first_person.print_info();
}
fmt!/format!
First, Rust is deprecating the fmt! macro, with printf-based syntax, in favor of format!, which uses syntax similar to Python format strings. The new version, Rust 0.9, will complain about the use of fmt!. Therefore, you should replace fmt!("%s is %i years old.", self.name, self.age) with format!("{} is {} years old.", self.name, self.age). However, we have a convenience macro println!(...) that means exactly the same thing as println(format!(...)), so the most idiomatic way to write your code in Rust would be
println!("{} is {} years old.", self.name, self.age);
Initializing structs
For a simple type like Person, it is idiomatic in Rust to create instances of the type by using the struct literal syntax:
let first_person = Person {
name: ~"Ohai",
age: get_int_from_input("Please enter a number for age.")
};
In cases where you do want a constructor, Person::new is the idiomatic name for a 'default' constructor (by which I mean the most commonly used constructor) for a type Person. However, it would seem strange for the default constructor to require initialization from user input. Usually, I think you would have a person module, for example (with person::Person exported by the module). In this case, I think it would be most idiomatic to use a module-level function fn person::prompt_for_age(name: ~str) -> person::Person. Alternatively, you could use a static method on Person -- Person::prompt_for_age(name: ~str).
&str vs. ~str in function parameters
I've changed the signature of get_int_from_input to take a &str instead of ~str. ~str denotes a string allocated on the exchange heap -- in other words, the heap that malloc/free in C, or new/delete in C++ operate on. Unlike in C/C++, however, Rust enforces the requirement that values on the exchange heap can only be owned by one variable at a time. Therefore, taking a ~str as a function parameter means that the caller of the function can't reuse the ~str argument that it passed in -- it would have to make a copy of the ~str using the .clone method.
On the other hand, &str is a slice into the string, which is just a reference to a range of characters in the string, so it doesn't require a new copy of the string to be allocated when a function with a &str parameter is called.
The reason to use &str rather than ~str for prompt_message in get_int_from_input is that the function doesn't need to hold onto the message past the end of the function. It only uses the prompt message in order to print it (and println takes a &str, not a ~str). Once you change the function to take &str, you can call it like get_int_from_input("Prompt") instead of get_int_from_input(~"Prompt"), which avoids the unnecessary allocation of "Prompt" on the heap (and similarly, you can avoid having to clone s in the code below):
let s: ~str = ~"Prompt";
let i = get_int_from_input(s.clone());
println(s); // Would complain that `s` is no longer valid without cloning it above
// if `get_int_from_input` takes `~str`, but not if it takes `&str`.
Option<T>::expect
The Option<T>::expect method is the idiomatic shortcut for the match statement you have, where you want to either return x if you get Some(x) or fail with a message if you get None.
Returning without return
In Rust, it is idiomatic (following the example of functional languages like Haskell and OCaml) to return a value without explicitly writing a return statement. In fact, the return value of a function is the result of the last expression in the function, unless the expression is followed by a semicolon (in which case it returns (), a.k.a. unit, which is essentially an empty placeholder value -- () is also what is returned by functions without an explicit return type, such as main or print_info).
Conclusion
I'm not a great expert on Rust by any means. If you want help on anything related to Rust, you can try, in addition to Stack Overflow, the #rust IRC channel on irc.mozilla.org or the Rust subreddit.
This isn't really rust-specifc, but try to split functionality into discrete units. Don't mix the low-level tasks of putting strings on the terminal and getting strings from the terminal with the more directly relevant (and largely implementation dependent) tasks of requesting a value, and verify it. When you do that, the design decisions you should make start to arise on their own.
For instance, you could write something like this (I haven't compiled it, and I'm new to rust myself, so they're probably at LEAST one thing wrong with this :) ).
fn validated_input_prompt<T>(prompt: ~str) {
println(prompt);
let res = io::stdin().read_line();
loop {
match res.len() {
s if s == 0 => { continue; }
s if s > 0 {
match T::from_str(res) {
Some(t) -> {
return t
},
None -> {
println("ERROR. Please try again.");
println(prompt);
}
}
}
}
}
}
And then use it as:
validated_input_prompt<int>("Enter a number:")
or:
validated_input_prompt<char>("Enter a Character:")
BUT, to make the latter work, you'd need to implement FromStr for chars, because (sadly) rust doesn't seem to do it by default. Something LIKE this, but again, I'm not really sure of the rust syntax for this.
use std::from_str::*;
impl FromStr for char {
fn from_str(s: &str) -> Option<Self> {
match len(s) {
x if x >= 1 => {
Option<char>.None
},
x if x == 0 => {
None,
},
}
return s[0];
}
}
A variation of telotortium's input reading function that doesn't fail on bad input. The loop { ... } keyword is preferred over writing while true { ... }. In this case using return is fine since the function is returning early.
fn int_from_input(prompt: &str) -> int {
println(prompt);
loop {
match from_str::<int>(io::stdin().read_line()) {
Some(x) => return x,
None => println("Oops, that was invalid input. Try again.")
};
}
}
I am writing a compiler of mini-pascal in Ocaml. I would like my compiler to accept the following code for instance:
program test;
var
a,b : boolean;
n : integer;
begin
...
end.
I have difficulties in dealing with the declaration of variables (the part following var). At the moment, the type of variables is defined like this in sib_syntax.ml:
type s_var =
{ s_var_name: string;
s_var_type: s_type;
s_var_uniqueId: s_uniqueId (* key *) }
Where s_var_uniqueId (instead of s_var_name) is the unique key of the variables. My first question is, where and how I could implement the mechanism of generating a new id (actually by increasing the biggest id by 1) every time I have got a new variable. I am wondering if I should implement it in sib_parser.mly, which probably involves a static variable cur_id and the modification of the part of binding, again don't know how to realize them in .mly. Or should I implement the mechanism at the next stage - the interpreter.ml? but in this case, the question is how to make the .mly consistent with the type s_var, what s_var_uniqueId should I provide in the part of binding?
Another question is about this part of statement in .mly:
id = IDENT COLONEQ e = expression
{ Sc_assign (Sle_var {s_var_name = id; s_var_type = St_void}, e) }
Here, I also need to provide the next level (the interpreter.ml) a variable of which I only know the s_var_name, so what could I do regarding its s_var_type and s_var_uniqueId here?
Could anyone help? Thank you very much!
The first question to ask yourself is whether you actually need an unique id. From my experience, they're almost never necessary or even useful. If what you're trying to do is making variables unique through alpha-equivalence, then this should happen after parsing is complete, and will probably involve some form of DeBruijn indices instead of unique identifiers.
Either way, a function which returns a new integer identifier every time it is called is:
let unique =
let last = ref 0 in
fun () -> incr last ; !last
let one = unique () (* 1 *)
let two = unique () (* 2 *)
So, you can simply assign { ... ; s_var_uniqueId = unique () } in your Menhir rules.
The more important problem you're trying to solve here is that of variable binding. Variable x is defined in one location and used in another, and you need to determine that it happens to be the same variable in both places. There are many ways of doing this, one of them being to delay the binding until the interpreter. I'm going to show you how to deal with this during parsing.
First, I'm going to define a context: it's a set of variables that allows you to easily retrieve a variable based on its name. You might want to create it with hash tables or maps, but to keep things simple I will be using List.assoc here.
type s_context = {
s_ctx_parent : s_context option ;
s_ctx_bindings : (string * (int * s_type)) list ;
s_ctx_size : int ;
}
let empty_context parent = {
s_ctx_parent = parent ;
s_ctx_bindings = [] ;
s_ctx_size = 0
}
let bind v_name v_type ctx =
try let _ = List.assoc ctx.s_ctx_bindings v_name in
failwith "Variable is already defined"
with Not_found ->
{ ctx with
s_ctx_bindings = (v_name, (ctx.s_ctx_size, v_type))
:: ctx.s_ctx_bindings ;
s_ctx_size = ctx.s_ctx_size + 1 }
let rec find v_name ctx =
try 0, List.assoc ctx.s_ctx_bindings v_name
with Not_found ->
match ctx.s_ctx_parent with
| Some parent -> let depth, found = find v_name parent in
depth + 1, found
| None -> failwith "Variable is not defined"
So, bind adds a new variable to the current context, find looks for a variable in the current context and its parents, and returns both the bound data and the depth at which it was found. So, you could have all global variables in one context, then all parameters of a function in another context that has the global context as its parent, then all local variables in a function (when you'll have them) in a third context that has the function's main context as the parent, and so on.
So, for instance, find 'x' ctx will return something like 0, (3, St_int) where 0 is the DeBruijn index of the variable, 3 is the position of the variable in the context identified by the DeBruijn index, and St_int is the type.
type s_var = {
s_var_deBruijn: int;
s_var_type: s_type;
s_var_pos: int
}
let find v_name ctx =
let deBruijn, (pos, typ) = find v_name ctx in
{ s_var_deBruijn = deBruijn ;
s_var_type = typ ;
s_var_pos = pos }
Of course, you need your functions to store their context, and make sure that the first argument is the variable at position 0 within the context:
type s_fun =
{ s_fun_name: string;
s_fun_type: s_type;
s_fun_params: context;
s_fun_body: s_block; }
let context_of_paramlist parent paramlist =
List.fold_left
(fun ctx (v_name,v_type) -> bind v_name v_type ctx)
(empty_context parent)
paramlist
Then, you can change your parser to take into account the context. The trick is that instead of returning an object representing part of your AST, most of your rules will return a function that takes a context as an argument and returns an AST node.
For instance:
int_expression:
(* Constant : ignore the context *)
| c = INT { fun _ -> Se_const (Sc_int c) }
(* Variable : look for the variable inside the contex *)
| id = IDENT { fun ctx -> Se_var (find id ctx) }
(* Subexpressions : pass the context to both *)
| e1 = int_expression o = operator e2 = int_expression
{ fun ctx -> Se_binary (o, e1 ctx, e2 ctx) }
;
So, you simply propagate the context "down" recursively through the expressions. The only clever parts are those when new contexts are created (you don't have this syntax yet, so I'm just adding a placeholder):
| function_definition_expression (args, body)
{ fun ctx -> let ctx = context_of_paramlist (Some ctx) args in
{ s_fun_params = ctx ;
s_fun_body = body ctx } }
As well as the global context (the program rule itself does not return a function, but the block rule does, and so a context is created from the globals and provided).
prog:
PROGRAM IDENT SEMICOLON
globals = variables
main = block
DOT
{ let ctx = context_of_paramlist None globals in
{ globals = ctx;
main = main ctx } }
All of this makes the implementation of your interpreter much easier due to the DeBruijn indices: you can have a "stack" which holds your values (of type value) defined as:
type stack = value array list
Then, reading and writing variable x is as simple as:
let read stack x =
(List.nth stack x.s_var_deBruijn).(x.s_var_pos)
let write stack x value =
(List.nth stack x.s_var_deBruijn).(x.s_var_pos) <- value
Also, since we made sure that function parameters are in the same order as their position in the function context, if you want to call function f and its arguments are stored in the array args, then constructing the stack is as simple as:
let inner_stack = args :: stack in
(* Evaluate f.s_fun_body with inner_stack here *)
But I'm sure you'll have a lot more questions to ask when you start working on your interpeter ;)
How to create a global id generator:
let unique =
let counter = ref (-1) in
fun () -> incr counter; !counter
Test:
# unique ();;
- : int = 0
# unique ();;
- : int = 1
Regarding your more general design question: it seems that your data representation does not faithfully represent the compiler phases. If you must return a type-aware data-type (with this field s_var_type) after the parsing phase, something is wrong. You have two choices:
devise a more precise data representation for the post-parsing AST, that would be different from the post-typing AST, and not have those s_var_type fields. Typing would then be a conversion from the untyped to the typed AST. This is a clean solution that I would recommend.
admit that you must break the data representation semantics because you don't have enough information at this stage, and try to be at peace with the idea of returning garbage such as St_void after the parsing phase, to reconstruct the correct information later. This is less typed (as you have an implicit assumption on your data which is not apparent in the type), more pragmatic, ugly but sometimes necessary. I don't think it's the right decision in this case, but you will encounter situation where it's better to be a bit less typed.
I think the specific choice of unique id handling design depends on your position on this more general question, and your concrete decisions about types. If you choose a finer-typed representation of post-parsing AST, it's your choice to decide whether to include unique ids or not (I would, because generating a unique ID is dead simple and doesn't need a separate pass, and I would rather slightly complexify the grammar productions than the typing phase). If you choose to hack the type field with a dummy value, it's also reasonable to do that for variable ids if you wish to, putting 0 as a dummy value and defining it later; but still I personally would do that in the parsing phase.