What's the idiomatic way to make replayable sequences in F#? - f#

I just started using this year's Advent of Code to learn F# and I immediately stepped on a rake by trying to reuse the IEnumerable from File.ReadLines.
Here are all of the ways I see to solve this:
// Read all lines immediately into array/list
let linesAll = File.ReadAllLines "file.txt"
let linesArray = File.ReadLines "file.txt" |> Array.ofSeq
let linesList = File.ReadLines "file.txt" |> List.ofSeq
// Lazily load and cache for replays
let linesCache = File.ReadLines "file.txt" |> Seq.cache
// Start new filesystem read for every replay
let linesDelay = (fun () -> File.ReadLines "file.txt") |> Seq.delay
let linesSeqExpr = seq { yield! File.ReadLines "file.txt" }
Are these all semantically identical (for a read-only file)?
Are linesDelay and linesSeqExpr the only ones that don't read the entire file into memory?
Is linesList slowed down by having to assemble the list backwards?
Are any of these considered more or less idiomatic?
Edit
Here is code that reproduces my issue:
let lines = System.IO.File.ReadLines("alphabet.txt")
for i = 0 to 5 do
let arr = Seq.zip lines (Seq.skip 1 lines) |> Array.ofSeq
printfn "%A %A" i arr
gives output:
0 [|("A", "C"); ("D", "E"); ("F", "G"); ("H", "I"); ("J", "K"); ("L", "M");
("N", "O"); ("P", "Q"); ("R", "S"); ("T", "U"); ("V", "W"); ("X", "Y")|]
1 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
2 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
3 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
4 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
5 [|("A", "B"); ("B", "C"); ("C", "D"); ("D", "E"); ("E", "F"); ("F", "G");
("G", "H"); ("H", "I"); ("I", "J"); ("J", "K"); ("K", "L"); ("L", "M");
("M", "N"); ("N", "O"); ("O", "P"); ("P", "Q"); ("Q", "R"); ("R", "S");
("S", "T"); ("T", "U"); ("U", "V"); ("V", "W"); ("W", "X"); ("X", "Y");
("Y", "Z")|]
Looks like Seq.zip lines (Seq.skip 1 lines) expression is triggering a bug by doing two enumerations at the same time.
Edit 2
Reproduction in C#. Slightly different order because I'm not skipping one on the right side.
var lines = File.ReadLines("alphabet.txt");
for (int i = 0; i < 5; i++)
{
var zipped = new List<(string, string)>();
var enum1 = lines.GetEnumerator();
var enum2 = lines.GetEnumerator();
while (enum1.MoveNext() && enum2.MoveNext())
{
zipped.Add((enum1.Current, enum2.Current));
}
Console.WriteLine($"{i} [{string.Join(',', zipped)}]");
}
0 [(A, B),(C, D),(E, F),(G, H),(I, J),(K, L),(M, N),(O, P),(Q, R),(S, T),(U, V),(W, X),(Y, Z)]
1 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
2 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
3 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
4 [(A, A),(B, B),(C, C),(D, D),(E, E),(F, F),(G, G),(H, H),(I, I),(J, J),(K, K),(L, L),(M, M),(N, N),(O, O),(P, P),(Q, Q),(R, R),(S, S),(T, T),(U, U),(V, V),(W, W),(X, X),(Y, Y),(Z, Z)]
Edit 3
This is a known issue and will not be fixed to keep compatibility.
// - IEnumerator<T> instances from the same IEnumerable<T> party on the same underlying
// reader.

What problem did you have by reusing the sequence from File.ReadLines? The following code works fine for me:
let lines = File.ReadLines "file.txt"
for line in lines do printfn "%s" line
for line in lines do printfn "%s" line
Anyway, here's my take on the answers to your questions:
Are these all semantically identical (for a read-only file)?
They're similar, but not identical, because they have different types. E.g. An array and a list don't have exactly the same semantics. (Also, keep in mind that even a read-only file, can be deleted, which will affect the lazy versions.)
Are linesDelay and linesSeqExpr the only ones that don't read the entire file into memory?
No, linesCache should also only read as many lines as are needed.
Is linesList slowed down by having to assemble the list backwards?
I don't think so. See source of List.ofSeq primitive here.
Are any of these considered more or less idiomatic?
I think they're all fine, depending on the circumstance. Personally, I often just use File.ReadAllLines unless I have reason to believe the file is huge.

Related

Memory corruption with fortran90 deallocate statement

I've built a minimal example of distributing fortran derived types using MPI_PACK, MPI_SEND, MPI_RECV, and also exchanging their boundaries to test MPI_SENDRECV for MPI_PACKED derived types.
The code just works fine, but it shows some strange behavior that I attribute to some memory corruption if I put deallocate statement in the middle of the code, while the code works fine with the deallocate statement at the end of the code. The dellocate statements are marked with (*) at the left side of the main script.
The flow of the code is,
1) MPI_PACK the whole derived type.
2) Distribute with MPI_SEND, MPI_RECV, and MPI_UNPACK recovering the derived type
structure.
3) MPI_PACK the boundaries of the distributed local derived type.
4) Exchange boundaries between adjacent processors using MPI_SENDRECV
I've put the exactly same code that I tested, so they would compile well with like mpif90 mod_data_structure.f90 main.f90 -o main, and the problem would be totally reproducible. The results below are the output from mpirun -np 2 main.
module mod_data_structure
implicit none
type type_cell
real(selected_real_kind(15,307)):: xc(2)
real(selected_real_kind(15,307)):: values_c(8)
integer :: flag_boundary
end type type_cell
type type_cell_list
type(type_cell) :: cell(13,13)
end type type_cell_list
type type_cell_list_local
type(type_cell),allocatable :: cell(:,:)
end type type_cell_list_local
end module mod_data_structure
program main
use MPI
use mod_data_structure
implicit none
integer,parameter :: nxmax = 9, nymax = 9, nbc = 2
integer :: i, j, k, ii, jj
type(type_cell_list) :: A
type(type_cell_list_local) :: A_local
type(type_cell) :: acell
character(len=20) :: write_fmt
! MPI variables
integer :: n_proc, my_id, ierr, source, dest
integer :: tag, tag_send, tag_recv
integer :: status ( MPI_STATUS_SIZE ), &
status_l ( MPI_STATUS_SIZE ), &
status_r ( MPI_STATUS_SIZE )
integer,allocatable :: local_size(:), local_start(:)
real(selected_real_kind(15,307)):: tmp
character,allocatable :: buffer(:), buffer_l(:), buffer_lg(:), buffer_r(:), buffer_rg(:)
integer :: bufsize, bufsize_gc
integer :: left_proc, right_proc
integer :: DBL_SIZE, INT_SIZE, position_local
integer :: position_l, position_r
integer,allocatable :: position(:)
call MPI_INIT ( ierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, my_id, ierr )
call MPI_COMM_SIZE ( MPI_COMM_WORLD, n_proc, ierr )
call MPI_PACK_SIZE(1,MPI_DOUBLE_PRECISION,MPI_COMM_WORLD,DBL_SIZE,ierr)
call MPI_PACK_SIZE(1,MPI_INTEGER ,MPI_COMM_WORLD,INT_SIZE,ierr)
! Construct the derived data types
if ( my_id .eq. 0 ) then
do i = 1,nxmax+2*nbc
do j = 1,nymax+2*nbc
A%cell(i,j)%flag_boundary = 0
do k = 1,8
A%cell(i,j)%values_c(k) = 0.d0
enddo
do k = 1,2
A%cell(i,j)%xc(k) = 0.d0
enddo
enddo
enddo
do i = 1+nbc,nxmax+nbc
do j = 1+nbc,nymax+nbc
ii = i - nbc
jj = j - nbc
A%cell(i,j)%flag_boundary = 10*ii + jj
do k = 1,8
A%cell(i,j)%values_c(k) = 1.d1*ii + jj + 0.1d0*k
enddo
do k = 1,2
A%cell(i,j)%xc(k) = 1.d1*ii + jj + 0.1d0*k
enddo
enddo
enddo
write(write_fmt, '(a,i,a)') '(',nymax+2*nbc,'i3)'
write(*,*) 'my_id ', my_id
write(*,*) 'Total flag_boundary'
do i = 1,nxmax+2*nbc
write(*,write_fmt) A%cell(i,:)%flag_boundary
enddo
write(*,*) ' '
endif
!*** Test MPI_PACK and MPI_SEND / MPI_RECV
! Prepare for the distribution
allocate ( local_size(n_proc), local_start(n_proc), position(n_proc) )
local_size = 0
local_start = 1
tmp = (nymax+2*nbc) / n_proc
! 'local_size'
do i = 1,n_proc-1
local_size(i) = ceiling(tmp)
enddo
local_size(n_proc) = nymax + 2*nbc - (n_proc - 1)*ceiling(tmp)
allocate ( A_local%cell(nxmax+2*nbc,local_size(my_id+1)) ) ! ###
! 'local_start'
do i = 1,n_proc-1
local_start(i+1:n_proc) = local_start(i+1:n_proc) + local_size(i)
enddo
! allocate 'buffer'
bufsize = maxval(local_size) * ( nxmax + 2*nbc ) * ( (8+2)*DBL_SIZE + (1)*INT_SIZE )
allocate ( buffer(bufsize) )
position = 0
if ( my_id .eq. 0 ) then
! Assign 'A_local' for 'my_id .eq. 0' itself
do j = 1, local_size(my_id+1)
do i = 1, nxmax+2*nbc
A_local%cell(i,j) = A%cell(i,j)
enddo
enddo
do k = 2, n_proc ! w/o 'my_id .eq. 0' itself
do j = local_start(k), local_start(k) + local_size(k) - 1
do i = 1,nxmax+2*nbc
acell = A%cell(i,j)
call MPI_PACK(acell%xc, 2, MPI_DOUBLE_PRECISION, buffer, bufsize, position(k), MPI_COMM_WORLD, ierr)
call MPI_PACK(acell%values_c, 8, MPI_DOUBLE_PRECISION, buffer, bufsize, position(k), MPI_COMM_WORLD, ierr)
call MPI_PACK(acell%flag_boundary, 1, MPI_INTEGER , buffer, bufsize, position(k), MPI_COMM_WORLD, ierr)
enddo
enddo
dest = k-1 ! ###
tag = k-1
call MPI_SEND (buffer, bufsize, MPI_PACKED, dest, tag, MPI_COMM_WORLD, ierr )
enddo
else ! ( my_id .ne. 0 ) then
source = 0
tag = my_id
call MPI_RECV (buffer, bufsize, MPI_PACKED, source, tag, MPI_COMM_WORLD, status, ierr )
position_local = 0
do j = 1, local_size(my_id+1)
do i = 1, nxmax+2*nbc
call MPI_UNPACK (buffer, bufsize, position_local, acell%xc, 2, MPI_DOUBLE_PRECISION, MPI_COMM_WORLD, ierr)
call MPI_UNPACK (buffer, bufsize, position_local, acell%values_c, 8, MPI_DOUBLE_PRECISION, MPI_COMM_WORLD, ierr)
call MPI_UNPACK (buffer, bufsize, position_local, acell%flag_boundary, 1, MPI_INTEGER , MPI_COMM_WORLD, ierr)
A_local%cell(i,j) = acell
enddo
enddo
endif
(*)!deallocate ( buffer )
do k = 1,n_proc
if ( my_id .eq. (k-1) ) then
write(write_fmt, '(a,i,a)') '(',local_size(my_id+1),'i3)'
write(*,*) ' Before MPI_SENDRECV'
write(*,*) 'my_id ', my_id
write(*,*) 'cols ', local_size(my_id+1)
do i = 1,nxmax+2*nbc
write(*,write_fmt) A_local%cell(i,:)%flag_boundary
enddo
write(*,*) ' '
endif
!call MPI_BARRIER ( MPI_COMM_WORLD, ierr )
enddo
! Test MPI_SENDRECV
bufsize_gc = nbc * ( nxmax + 2*nbc ) * ( (8+2)*DBL_SIZE + (1)*INT_SIZE )
allocate ( buffer_l(bufsize_gc), buffer_lg(bufsize_gc), buffer_r(bufsize_gc), buffer_rg(bufsize_gc) )
! 'left_proc'
if ( my_id .eq. 0 ) then
left_proc = MPI_PROC_NULL
else ! ( my_id .ne. 0 ) then
left_proc = my_id - 1
endif
! 'right_proc'
if ( my_id .eq. n_proc-1 ) then
right_proc = MPI_PROC_NULL
else ! ( my_id .ne. n_proc - 1 )
right_proc = my_id + 1
endif
! pack 'buffer_l' & 'buffer_r'
position_l = 0
do j = 1,nbc
do i = 1,nxmax+2*nbc
acell = A_local%cell(i,j)
call MPI_PACK(acell%xc, 2, MPI_DOUBLE_PRECISION, buffer_l, bufsize_gc, position_l, MPI_COMM_WORLD, ierr)
call MPI_PACK(acell%values_c, 8, MPI_DOUBLE_PRECISION, buffer_l, bufsize_gc, position_l, MPI_COMM_WORLD, ierr)
call MPI_PACK(acell%flag_boundary, 1, MPI_INTEGER , buffer_l, bufsize_gc, position_l, MPI_COMM_WORLD, ierr)
enddo
enddo
position_r = 0
do j = local_size(my_id+1)-nbc+1, local_size(my_id+1)-nbc+nbc
do i = 1,nxmax+2*nbc
acell = A_local%cell(i,j)
call MPI_PACK(acell%xc, 2, MPI_DOUBLE_PRECISION, buffer_r, bufsize_gc, position_r, MPI_COMM_WORLD, ierr)
call MPI_PACK(acell%values_c, 8, MPI_DOUBLE_PRECISION, buffer_r, bufsize_gc, position_r, MPI_COMM_WORLD, ierr)
call MPI_PACK(acell%flag_boundary, 1, MPI_INTEGER , buffer_r, bufsize_gc, position_r, MPI_COMM_WORLD, ierr)
enddo
enddo
tag_send = my_id
tag_recv = right_proc
call MPI_SENDRECV (buffer_l, bufsize_gc, MPI_PACKED, left_proc, 0, &
buffer_rg, bufsize_gc, MPI_PACKED, right_proc, 0, &
MPI_COMM_WORLD, status_l, ierr )
tag_send = my_id
tag_recv = left_proc
call MPI_SENDRECV (buffer_r, bufsize_gc, MPI_PACKED, right_proc, 0, &
buffer_lg, bufsize_gc, MPI_PACKED, left_proc, 0, &
MPI_COMM_WORLD, status_r, ierr )
! fill left boundary
position_l = 0
do j = 1,nbc
do i = 1,nxmax+2*nbc
call MPI_UNPACK (buffer_lg, bufsize_gc , position_l, acell%xc, 2, MPI_DOUBLE_PRECISION, MPI_COMM_WORLD, ierr)
call MPI_UNPACK (buffer_lg, bufsize_gc , position_l, acell%values_c, 8, MPI_DOUBLE_PRECISION, MPI_COMM_WORLD, ierr)
call MPI_UNPACK (buffer_lg, bufsize_gc , position_l, acell%flag_boundary, 1, MPI_INTEGER , MPI_COMM_WORLD, ierr)
A_local%cell(i,j) = acell
enddo
enddo
! fill right boundary
position_r = 0
do j = local_size(my_id+1)-nbc+1, local_size(my_id+1)-nbc+nbc
do i = 1,nxmax+2*nbc
call MPI_UNPACK (buffer_rg, bufsize_gc , position_r, acell%xc, 2, MPI_DOUBLE_PRECISION, MPI_COMM_WORLD, ierr)
call MPI_UNPACK (buffer_rg, bufsize_gc , position_r, acell%values_c, 8, MPI_DOUBLE_PRECISION, MPI_COMM_WORLD, ierr)
call MPI_UNPACK (buffer_rg, bufsize_gc , position_r, acell%flag_boundary, 1, MPI_INTEGER , MPI_COMM_WORLD, ierr)
A_local%cell(i,j) = acell
enddo
enddo
do k = 1,n_proc
if ( my_id .eq. (k-1) ) then
write(write_fmt, '(a,i,a)') '(',local_size(my_id+1),'i3)'
write(*,*) ' After MPI_SENDRECV'
write(*,*) 'my_id ', my_id
write(*,*) 'cols ', local_size(my_id+1)
do i = 1,nxmax+2*nbc
write(*,write_fmt) A_local%cell(i,:)%flag_boundary
enddo
write(*,*) ' '
endif
!call MPI_BARRIER ( MPI_COMM_WORLD, ierr )
enddo
(*)deallocate ( buffer )
deallocate ( buffer_l, buffer_lg, buffer_r, buffer_rg )
call MPI_FINALIZE ( ierr )
end program
With deallocate(buffer) in the middle of the code, a part of the output looks like below, which worked as I intended.
After MPI_SENDRECV
my_id 0
cols 6
0 0 0 0 0 0
0 0 0 0 0 0
0 0 11 12 15 16
0 0 21 22 25 26
0 0 31 32 35 36
0 0 41 42 45 46
0 0 51 52 55 56
0 0 61 62 65 66
0 0 71 72 75 76
0 0 81 82 85 86
0 0 91 92 95 96
0 0 0 0 0 0
0 0 0 0 0 0
But if I locate deallocate(buffer) in the middle of the code, the same part of the output looks like this.
After MPI_SENDRECV
my_id 0
cols 6
0 0 0 0 0 0
****** 0 0 0 0
****** 11 12 15 16
****** 21 22 25 26
****** 31 32 35 36
****** 41 42 45 46
****** 51 52 55 56
****** 61 62 65 66
****** 71 72 75 76
****** 81 82 85 86
0 0 91 92 95 96
0 0 0 0 0 0
0 0 0 0 0 0
And if I change write format to show more digits of integer, they are 10 digits of integer which goes like 1079533568.
I've seen this kind of problem at Segmentation Fault using MPI_Sendrecv with a 2D contiguous array, but there were no clear answer to the reason why putting the deallocate statement of variables that I wouldn't use for the rest of the code at the middle of the code makes such problem.
Where this problem stems from?
I am not sure if I'm answering this question fairly, but my practical experience with derived types is that the safest way to handle them with different MPI implementations is to not use any advanced MPI constructs and keep all derived type work on the Fortran side.
For example, I would write pure functions to pack and expand your datatypes:
integer, parameter :: TYPE_CELL_BUFSIZE = 11
pure function type_cell_pack(this) result(buffer)
class(type_cell), intent(in) :: this
real(real64) :: buffer(TYPE_CELL_BUFSIZE)
buffer(1:8) = this%values_c
buffer(9:10) = this%xc
! It will be faster to not use a separate MPI command for this only
buffer(11) = real(this%flag_boundary,real64)
end function type_cell_pack
pure type(type_cell) function type_cell_unpack(buffer) result(this)
real(real64), intent(in) :: buffer(TYPE_CELL_BUFSIZE)
this%values_c = buffer(1:8)
this%xc = buffer(9:10)
this%flag_boundary = nint(buffer(11))
end function type_cell_unpack
And then write two wrappers for MPI comms using MPI_send and MPI_recv only, like this for a scalar quantity:
subroutine type_cell_send_scalar(this,fromCpu,toCpu,mpiWorld)
type(type_cell), intent(inout) :: this
integer, intent(in) :: fromCpu,toCpu,mpiWorld
real(real64) :: mpibuf(TYPE_CELL_BUFSIZE)
if (cpuid==fromCpu) then
mpibuf = type_cell_pack(this)
call mpi_send(...,mpibuf,...,MPI_DOUBLE_PRECISION,...)
elseif (cpuid==toCpu) then
call mpi_recv(...,mpibuf,...,MPI_DOUBLE_PRECISION,...)
this = type_cell_unpack(mpibuf)
endif
end subroutine type_cell_send_scalar
And the following for an array quantity:
subroutine type_cell_send_array(these,fromCpu,toCpu,mpiWorld)
type(type_cell), intent(inout) :: these(:)
integer, intent(in) :: fromCpu,toCpu,mpiWorld
integer :: i,ncell,bufsize
real(real64) :: mpibuf(TYPE_CELL_BUFSIZE*size(these))
ncell = size(these)
bufsize = ncell*TYPE_CELL_BUFSIZE
if (cpuid==fromCpu) then
do i=1,ncell
mpibuf((i-1)*TYPE_CELL_BUFSIZE+1:i*TYPE_CELL_BUFSIZE) = type_cell_pack(these(i))
end do
call mpi_send(bufsize,mpibuf,...,MPI_DOUBLE_PRECISION,...)
elseif (cpuid==toCpu) then
call mpi_recv(bufsize,mpibuf,...,MPI_DOUBLE_PRECISION,...)
do i=1,ncell
these(i) = type_cell_unpack(mpibuf((i-1)*TYPE_CELL_BUFSIZE+1:i*TYPE_CELL_BUFSIZE))
end do
endif
end subroutine type_cell_send_array

Getting the signature of a FunctionDecl

I got the FunctionDecl for the definition of a function. There is no declaration for this function.
For example:
int foo(char c, double d)
{
...
}
How do I get the signature (qualifier, return type, function name, parametrs) as a valid signature I could use to make a declaration?
I found that the easiest way is to use the lexer to get the signature of the function. Since I wanted to make a declaration out of a definition, I wanted the declaration to look exactly like the definition.
Therefore I defined a SourceRange from the start of the function to the beginning of the body of the function (minus the opening "{") and let the lexer give me this range as a string.
static std::string getDeclaration(const clang::FunctionDecl* D)
{
clang::ASTContext& ctx = D->getASTContext();
clang::SourceManager& mgr = ctx.getSourceManager();
clang::SourceRange range = clang::SourceRange(D->getSourceRange().getBegin(), D->getBody()->getSourceRange().getBegin());
StringRef s = clang::Lexer::getSourceText(clang::CharSourceRange::getTokenRange(range), mgr, ctx.getLangOpts());
return s.substr(0, s.size() - 2).str().append(";");
}
This solution assums that the FunctionDecl is a definition (has a body).
Maybe this is what you were looking for...
bool VisitDecl(Decl* D) {
auto k = D->getDeclKindName();
auto r = D->getSourceRange();
auto b = r.getBegin();
auto e = r.getEnd();
auto& srcMgr = Context->getSourceManager();
if (srcMgr.isInMainFile(b)) {
auto d = depth - 2u;
auto fname = srcMgr.getFilename(b);
auto bOff = srcMgr.getFileOffset(b);
auto eOff = srcMgr.getFileOffset(e);
llvm::outs() << std::string(2*d,' ') << k << "Decl ";
llvm::outs() << "<" << fname << ", " << bOff << ", " << eOff << "> ";
if (D->getKind() == Decl::Kind::Function) {
auto fnDecl = reinterpret_cast<FunctionDecl*>(D);
llvm::outs() << fnDecl->getNameAsString() << " ";
llvm::outs() << "'" << fnDecl->getType().getAsString() << "' ";
} else if (D->getKind() == Decl::Kind::ParmVar) {
auto pvDecl = reinterpret_cast<ParmVarDecl*>(D);
llvm::outs() << pvDecl->getNameAsString() << " ";
llvm::outs() << "'" << pvDecl->getType().getAsString() << "' ";
}
llvm::outs() << "\n";
}
return true;
}
Sample output:
FunctionDecl <foo.c, 48, 94> foo 'int (unsigned int)'
ParmVarDecl <foo.c, 56, 69> x 'unsigned int'
CompoundStmt <foo.c, 72, 94>
ReturnStmt <foo.c, 76, 91>
ParenExpr <foo.c, 83, 91>
BinaryOperator <foo.c, 84, 17>
ImplicitCastExpr <foo.c, 84, 84>
DeclRefExpr <foo.c, 84, 84>
ParenExpr <foo.c, 28, 45>
BinaryOperator <foo.c, 29, 43>
ParenExpr <foo.c, 29, 39>
BinaryOperator <foo.c, 30, 12>
IntegerLiteral <foo.c, 30, 30>
IntegerLiteral <foo.c, 12, 12>
IntegerLiteral <foo.c, 43, 43>
You will notice the reinterpret_cast<OtherDecl*>(D) function calls. Decl is the base class for all AST OtherDecl classes like FunctionDecl or ParmVarDecl. So reinterpreting the pointer is allowed and gets you access to that particular AST node's attributes. Since these more-specific AST Nodes inherit the NamedDecl and ValueDecl classes, obtaining the function name and the function type (signature) is simple. The same can be applied to the base class Stmt and other inherited classes like the OtherExpr classes.

Is it possible to dump the EBNF/BNF grammar table of a pyparsing object?

Preface: this may be an stupid uniformed question.
I have a grammar I wrote with the pyparsing library (and the help of stack-overflow posts) that parses nested expressions with parenthesis, curly, and square brackets. I'm curious what productions in a grammar table would look like. I was wondering if there was a way to automatically generate this for an arbitrary pyparsing context free grammar.
For reference the pyparsing grammer is defined here:
def parse_nestings(string, only_curl=False):
r"""
References:
http://stackoverflow.com/questions/4801403/pyparsing-nested-mutiple-opener-clo
CommandLine:
python -m utool.util_gridsearch parse_nestings:1 --show
Example:
>>> from utool.util_gridsearch import * # NOQA
>>> import utool as ut
>>> string = r'lambda u: sign(u) * abs(u)**3.0 * greater(u, 0)'
>>> parsed_blocks = parse_nestings(string)
>>> recombined = recombine_nestings(parsed_blocks)
>>> print('PARSED_BLOCKS = ' + ut.repr3(parsed_blocks, nl=1))
>>> print('recombined = %r' % (recombined,))
>>> print('orig = %r' % (string,))
PARSED_BLOCKS = [
('nonNested', 'lambda u: sign'),
('paren', [('ITEM', '('), ('nonNested', 'u'), ('ITEM', ')')]),
('nonNested', '* abs'),
('paren', [('ITEM', '('), ('nonNested', 'u'), ('ITEM', ')')]),
('nonNested', '**3.0 * greater'),
('paren', [('ITEM', '('), ('nonNested', 'u, 0'), ('ITEM', ')')]),
]
Example:
>>> from utool.util_gridsearch import * # NOQA
>>> import utool as ut
>>> string = r'\chapter{Identification \textbf{foobar} workflow}\label{chap:application}'
>>> parsed_blocks = parse_nestings(string)
>>> print('PARSED_BLOCKS = ' + ut.repr3(parsed_blocks, nl=1))
PARSED_BLOCKS = [
('nonNested', '\\chapter'),
('curl', [('ITEM', '{'), ('nonNested', 'Identification \\textbf'), ('curl', [('ITEM', '{'), ('nonNested', 'foobar'), ('ITEM', '}')]), ('nonNested', 'workflow'), ('ITEM', '}')]),
('nonNested', '\\label'),
('curl', [('ITEM', '{'), ('nonNested', 'chap:application'), ('ITEM', '}')]),
]
"""
import utool as ut # NOQA
import pyparsing as pp
def as_tagged(parent, doctag=None):
"""Returns the parse results as XML. Tags are created for tokens and lists that have defined results names."""
namedItems = dict((v[1], k) for (k, vlist) in parent._ParseResults__tokdict.items()
for v in vlist)
# collapse out indents if formatting is not desired
parentTag = None
if doctag is not None:
parentTag = doctag
else:
if parent._ParseResults__name:
parentTag = parent._ParseResults__name
if not parentTag:
parentTag = "ITEM"
out = []
for i, res in enumerate(parent._ParseResults__toklist):
if isinstance(res, pp.ParseResults):
if i in namedItems:
child = as_tagged(res, namedItems[i])
else:
child = as_tagged(res, None)
out.append(child)
else:
# individual token, see if there is a name for it
resTag = None
if i in namedItems:
resTag = namedItems[i]
if not resTag:
resTag = "ITEM"
child = (resTag, pp._ustr(res))
out += [child]
return (parentTag, out)
def combine_nested(opener, closer, content, name=None):
r"""
opener, closer, content = '(', ')', nest_body
"""
import utool as ut # NOQA
ret1 = pp.Forward()
_NEST = ut.identity
#_NEST = pp.Suppress
opener_ = _NEST(opener)
closer_ = _NEST(closer)
group = pp.Group(opener_ + pp.ZeroOrMore(content) + closer_)
ret2 = ret1 << group
if ret2 is None:
ret2 = ret1
else:
pass
#raise AssertionError('Weird pyparsing behavior. Comment this line if encountered. pp.__version__ = %r' % (pp.__version__,))
if name is None:
ret3 = ret2
else:
ret3 = ret2.setResultsName(name)
assert ret3 is not None, 'cannot have a None return'
return ret3
# Current Best Grammar
nest_body = pp.Forward()
nestedParens = combine_nested('(', ')', content=nest_body, name='paren')
nestedBrackets = combine_nested('[', ']', content=nest_body, name='brak')
nestedCurlies = combine_nested('{', '}', content=nest_body, name='curl')
nonBracePrintables = ''.join(c for c in pp.printables if c not in '(){}[]') + ' '
nonNested = pp.Word(nonBracePrintables).setResultsName('nonNested')
nonNested = nonNested.leaveWhitespace()
# if with_curl and not with_paren and not with_brak:
if only_curl:
# TODO figure out how to chain |
nest_body << (nonNested | nestedCurlies)
else:
nest_body << (nonNested | nestedParens | nestedBrackets | nestedCurlies)
nest_body = nest_body.leaveWhitespace()
parser = pp.ZeroOrMore(nest_body)
debug_ = ut.VERBOSE
if len(string) > 0:
tokens = parser.parseString(string)
if debug_:
print('string = %r' % (string,))
print('tokens List: ' + ut.repr3(tokens.asList()))
print('tokens XML: ' + tokens.asXML())
parsed_blocks = as_tagged(tokens)[1]
if debug_:
print('PARSED_BLOCKS = ' + ut.repr3(parsed_blocks, nl=1))
else:
parsed_blocks = []
return parsed_blocks

with creating table in table in Lua C-API

I use this code for creating table in table (Like namespace) in Lua C-API:
JNIEXPORT void JNICALL Java_com_naef_jnlua_LuaState_lua_1import_1tables(JNIEnv *env,
jobject obj, jstring namespace) {
lua_State *L;
JNLUA_ENV(env);
L = getluathread(obj);
char * str= getstringchars(namespace);
char ** res = NULL;
char * p = strtok (str, ".");
int n_spaces = 0, i;
while (p) {
res = realloc (res, sizeof (char*) * ++n_spaces);
if (res == NULL)
exit (-1);
res[n_spaces-1] = p;
p = strtok (NULL, ".");
}
for (i = 0; i < (n_spaces); ++i) {
if (i == 0) {
lua_newtable(L);
} else if (i == (n_spaces - 1)) {
lua_pushlstring(L, res[i], (sizeof(res[i])/sizeof(char))-1);
lua_getglobal(L, res[i]);
break;
} else {
lua_pushlstring(L, res[i], (sizeof(res[i])/sizeof(char))-1);
lua_newtable(L);
}
}
for (i = (n_spaces - 2); i >= 0 ; i--) {
if (i == 0) {
lua_setglobal(L, res[i]);
break;
} else {
lua_settable(L, -3);
}
}
free(res);
}
This is can be equals for this hardcode:
lua_newtable( L ); /* ==> stack: ..., {} */
{
lua_pushliteral( L, "b" ); /* ==> stack: ..., {}, "b" */
lua_newtable( L ); /* ==> stack: ..., {}, "b", {} */
{
lua_pushliteral( L, "c" ); /* == stack: ..., {}, "b", {}, "c" */
lua_newtable( L ); /* ==> stack: ..., {}, "b", {}, "c", {} */
{
lua_pushliteral( L, "d" );
lua_getglobal(L, "MyTable");
lua_settable( L, -3 );
}
lua_settable( L, -3 ); /* ==> stack: ..., {}, "b", {} */
}
lua_settable( L, -3 ); /* ==> stack: ..., {} */
}
lua_setglobal( L, "a" ); /* ==> stack: ... */
When i send i function Java_com_naef_jnlua_LuaState_lua_1import_1tables() String looks like this "com.naef.jnlua.test.fixture.TestObject"// TestObject is equivalent "MyTable", other like "com" tables in tables and TestObject table - the last table
When i try after this code execute lua code com.naef.jnlua.test.fixture.TestObject
attempt to index field 'naef' (a nil value)
Where i mistake?

rewriting of Z3_ast during its traversing in C++

to_expr function leads to error. Could you advise what is wrong below?
context z3_cont;
expr x = z3_cont.int_const("x");
expr y = z3_cont.int_const("y");
expr ge = ((y==3) && (x==2));
ge = swap_tree( ge );
where swap_tree is a function that shall swap all operands of binary operations. It defined as follows.
expr swap_tree( expr e ) {
Z3_ast ee[2];
if ( e.is_app() && e.num_args() == 2) {
for ( int i = 0; i < 2; ++i ) {
ee[ 1 - i ] = swap_tree( e.arg(i) );
}
for ( int i = 0; i < 2; ++i ) {
cout <<" ee[" << i << "] : " << to_expr( z3_cont, ee[ i ] ) << endl;
}
return to_expr( z3_cont, Z3_update_term( z3_cont, e, 2, ee ) );
}
else
return e;
}
The problem is "referencing counting". A Z3 object can be garbage collected by the system if its reference counter is 0. The Z3 C++ API provides "smart pointers" (expr, sort, ...) for automatically managing the reference counters for us. Your code uses Z3_ast ee[2]. In the for-loop, you store the result of swap_tree(e.arg(0)) into ee[0]. Since the reference counter is not incremented, this Z3 object may be deleted when executing the second iteration of the loop.
Here is a possible fix:
expr swap_tree( expr e ) {
if ( e.is_app() && e.num_args() == 2) {
// using smart-pointers to store the intermediate results.
expr ee0(z3_cont), ee1(z3_cont);
ee0 = swap_tree( e.arg(0) );
ee1 = swap_tree( e.arg(1) );
Z3_ast ee[2] = { ee1, ee0 };
return to_expr( z3_cont, Z3_update_term( z3_cont, e, 2, ee ) );
}
else {
return e;
}
}

Resources