luajit/physicsfs mutex deadlock - lua

I've got the following code:
local M=ffi.load "physfs"
ffi.cdef [[ //basically the preprocessed content of physfs.h, see http://icculus.org/physfs/docs/html/physfs_8h.html ]]
M.PHYSFS_init(arg[0])
M.PHYSFS_setSaneConfig("a","b","zip",0,0)
function file2str(path)
local cpath=ffi.cast("const char *",path)
print(1) --debug
if M.PHYSFS_exists(cpath)==0 then return nil,"file not found" end
print(2) --debug
-- some more magic
end
assert(file2str("someFile.txt"))
when calling, I expect debug output 1 and 2, or at least the assert triggering, but I only get:
1
["endless" (i pressed ^C after about a minute) freeze]
when i finally got luajit to run in gdb, this is the backtrace when freezing:
(gdb) bt
#0 0x00007ffff37a5c40 in __pause_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff379bce6 in __pthread_mutex_lock_full (mutex=0x68cbf0)
at ../nptl/pthread_mutex_lock.c:354
#2 0x00007ffff606951f in __PHYSFS_platformGrabMutex (mutex=0x68cbf0)
at /home/kyra/YDist/src/physfs-2.0.3/platform/unix.c:403
#3 0x00007ffff606410d in PHYSFS_getWriteDir ()
at /home/kyra/YDist/src/physfs-2.0.3/physfs.c:913
#4 0x000000000045482b in ?? ()
#5 0x000000000043a829 in ?? ()
#6 0x000000000043af17 in ?? ()
#7 0x00000000004526a6 in ?? ()
#8 0x0000000000446fb0 in lua_pcall ()
#9 0x00000000004047dc in _start ()
so it seems to me that something is blocking the mutex, which is kinda strange, because, while there are two threads running, only one even touches physfs (the second thread doesn't even ffi.load "physfs")
what could/should I do?

I still don't really know what the hell is going on, but while trying to further debug the mutex in gdb I LD_PRELOADed libpthread.so to the gdb process, and suddenly it worked.
Then I tried just preloading it to luajit without gdb, also works.
Then I dug further into physfs and lualanes (which is a pthread ffi wrapper I'm using for threading), to find out they both try to load libpthread if not already loaded, but physfs from C and lualanes using the ffi, which somehow doesn't see the one loaded by physfs, and the process ends up with 2 copies of the library loaded.
so the fix is to explicitely do a ffi.load"pthread" before ffi.load"physfs", because while lanes can't see the version loaded by physfs, physfs is just happy with the version loaded by us, and doesn't try to load it again, while the luajit ffi ignores further load tries made by lanes.

Related

Xorg crashing in custom image

I generated an image to an advantech PCM-9375 board using the yocto system (branch dunfell).
The outcome uses Xorg as the video manager, however, the application is crashing due to the geode driver installed with it.
I debugged it and discovered that the crashes happen when the driver function LXReadMSR is called with the parameters: addr=0x80002000h, lo=0xbffff994 and hi=0xbffff998. The last two are pointers, and their contents are: 5136 and 0, respectively.
The snippet below is the gdb's backtrace:
(gdb) bt
#0 0xb7693ba7 in LXReadMSR (hi=0xbffff998, lo=0xbffff994, addr=2147491840) at ../../xf86-video-geode-2.11.20/src/lx_driver.c:131
#1 LXReadMSR (addr=2147491840, lo=0xbffff994, hi=0xbffff998) at ../../xf86-video-geode-2.11.20/src/lx_driver.c:126
#2 0xb7681eef in msr_create_geodelink_table (gliu_nodes=0xb76b2880 <gliu_nodes>) at ../../xf86-video-geode-2.11.20/src/cim/cim_msr.c:199
#3 0xb7682400 in msr_init_table () at ../../xf86-video-geode-2.11.20/src/cim/cim_msr.c:82
#4 0xb7693282 in LXPreInit (pScrni=0x6a79e0, flags=0) at ../../xf86-video-geode-2.11.20/src/lx_driver.c:349
#5 0x00480986 in InitOutput (pScreenInfo=0x688280 <screenInfo>, argc=12, argv=0xbffffc44) at ../../../../xorg-server-1.20.14/hw/xfree86/common/xf86Init.c:522
#6 0x00444525 in dix_main (argc=12, argv=0xbffffc44, envp=0xbffffc78) at ../../xorg-server-1.20.14/dix/main.c:193
#7 0x0042d89b in main (argc=12, argv=0xbffffc44, envp=0xbffffc78) at ../../xorg-server-1.20.14/dix/stubmain.c:34
Looking in the documentation of the processor, I found out that addr points to GLD_MSR_CAP register (chapter "6.6.1.1 GLD Capabilities MSR (GLD_MSR_CAP)" in the documentation), however I didn't figure out what's happening.
Solutions tried:
Insertion of the CLI kernel instruction "iomem=relaxed", as pointed by item 6 of the driver's readme file in its github repository;
Replacing of the kernel configuration "CONFIG_BLK_DEV_CS5535=y" by "CONFIG_BLK_DEV_CS5536=y".
None of them worked.
Xorg version: 1.20.14
Geode driver version: 2.11.20
Did anyone have a similar problem? Does anyone know what's happenning?
My next tries will be the modification of kernel config parameters, but there are a lot and I'don't know which of them are related to the problem.
Problem solved when the kernel option "CONFIG_X86_IOPL_IOPERM" was enabled.
I've come to this solution after reading this post.

Dart AST library fails when compiled

I'm building a program that use the Dart's AST library, and it works fine as long as I use the Dart interpreter to run the program (dart filename.dart).
Once I want to compile the program (dart compile filename.dart), the program can't load the file and I have this stacktrace:
#0 _PhysicalFile.readAsStringSync (package:analyzer/file_system/physical_file_system.dart:184)
#1 FolderBasedDartSdk.languageVersion (package:analyzer/src/dart/sdk/sdk.dart:400)
#2 FeatureSetProvider.build (package:analyzer/src/dart/analysis/feature_set_provider.dart:143)
#3 AnalysisDriver._createFileTracker (package:analyzer/src/dart/analysis/driver.dart:1500)
#4 new AnalysisDriver (package:analyzer/src/dart/analysis/driver.dart:291)
#5 ContextBuilder.buildDriver (package:analyzer/src/context/builder.dart:119)
#6 ContextBuilderImpl.createContext (package:analyzer/src/dart/analysis/context_builder.dart:94)
#7 new AnalysisContextCollectionImpl (package:analyzer/src/dart/analysis/analysis_context_collection.dart:55)
#8 _createAnalysisContext (package:analyzer/dart/analysis/utilities.dart:125)
#9 resolveFile (package:analyzer/dart/analysis/utilities.dart:115)
#10 main (package:DartProjects/dartprojects.dart:122)
#11 _startIsolate.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:299)
#12 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:168)
I took a look at the source code to see where could be the error, and it seems that in package:analyzer/src/dart/sdk/sdk.dart it tries to get the langage version file, but instead of using the PATH to know where my dart sdk is, it tries to find it in my InteliJ folder, which fails. Also, I tried to run it on a freshly created VM, and it fails too.
Here is the code that produce this output:
import 'dart:io';
import 'package:analyzer/dart/analysis/utilities.dart';
import 'package:analyzer/dart/ast/ast.dart';
import 'package:analyzer/dart/ast/visitor.dart';
void main(List<String> arguments) async {
final fileName = Directory.current.path + r'\test.dart';
var source = null;
try {
source = await resolveFile(path: fileName);
} catch (e, s) {
print('${s}');
return;
}
}
Thanks for your help.
Not a good workaround.
Copy the "version" file from the Dart SDK folder to the root of your project.
The analyzer will take it and use it.
This will work if your compiled file is in the "bin" folder.
This is not a good workaround.
P.S.
This is called a hack.

Get true stack trace of an error in lua pcall

So for my pcall statements, I've been doing something like this
local status, err = pcall(fn)
if not status then
print(err)
print(debug.stacktrace())
end
This works fine for some basic stuff but the issue is that debug.stacktrace() returns the CURRENT relative stack trace, not the stack trace of the error. If the error within fn happened 10 levels down in the stack, then I wouldn't know where exactly it occurred, just that this pcall block is failing. I was wondering if there was a way to get the stack trace of the pcall and not the current stack trace. I tried debug.stacktrace(err) but it didn't make a difference.
You need to use xpcall to provide a custom function that will add the stacktrace to the error message. From PiL:
Frequently, when an error happens, we want more debug information than
only the location where the error occurred. At least, we want a
traceback, showing the complete stack of calls leading to the error.
When pcall returns its error message, it destroys part of the stack
(the part that went from it to the error point). Consequently, if we
want a traceback, we must build it before pcall returns. To do that,
Lua provides the xpcall function. Besides the function to be called,
it receives a second argument, an error handler function. In case of
errors, Lua calls that error handler before the stack unwinds, so that
it can use the debug library to gather any extra information it wants
about the error.
You may want to check this patch that extends pcall to include stacktrace.
As suggested in the comments, you can use local ok, res = xpcall(f, debug.traceback, args...) with Lua 5.2+ or LuaJIT (with Lua 5.2 compatibility turned on) and used a patch mentioned above for Lua 5.1.
The basic problem is (roughly) that pcall must unwind the stack so that your error handling code is reached. This gives two obvious ways to tackle the problem: Create the stack trace before unwinding, or move the (potentially) error-throwing code out of the way, so the stack frames don't have to be removed.
The first is handled by xpcall. This sets an error handler that can create a message while the stack is still intact. (Note that there are some situations where xpcall will not call the handler,1 so it's not suitable for cleanup code! But for stack traces, it's generally good enough.)
The second option (this works always2) is to preserve the stack by moving the code to a different coroutine. Instead of
local ok, r1, r2, etc = pcall( f, ... )
do
local co = coroutine.create( f )
local ok, r1, r2, etc = coroutine.resume( f, ... )
and now the stack (in co) is still preserved and can be queried by debug.traceback( co ) or other debug functions.
If you want the full stack trace, you'll then have to collect both the stack trace inside the coroutine and the stack trace outside of it (where you currently are) and then combine both while dropping the first line of the latter:
local full_tb = debug.traceback( co )
.. debug.traceback( ):sub( 17 ) -- drop 'stack traceback:' line
1 One situation in which the handler isn't called is for OOMs:
g = ("a"):rep( 1024*1024*1024 ) -- a gigabyte of 'a's
-- fail() tries to create a 32GB string – make it larger if that doesn't OOM
fail = load( "return "..("g"):rep( 32, ".." ), "(replicator)" )
-- plain call errors without traceback
fail()
--> not enough memory
-- xpcall does not call the handler either:
xpcall( fail, function(...) print( "handler:", ... ) return ... end, "foo" )
--> false not enough memory
-- (for comparison: here, the handler is called)
xpcall( error, function(...) print( "handler:", ... ) return ... end, "foo" )
--> handler: foo
-- false foo
-- coroutine preserves the stack anyway:
do
local co = coroutine.create( fail )
print( "result:", coroutine.resume( fail ) )
print( debug.traceback( co ) .. debug.traceback( ):sub( 17 ) )
end
--> result: false not enough memory
--> stack traceback:
-- [string "(replicator)"]:1: in function 'fail'
-- stdin:4: in main chunk
-- [C]: in ?
2 Well, at least as long as Lua itself doesn't crash.

Issue with neuralnetwork_turial.lua with data preprocessing

I have installed the torch deep learning module by first git clone-ing and later using luarocks make and the installation was succussful. The require 'dp' works well in the torch prompt.
But when I try to execute the neuralnetwork_tutorial.lua(th neuralnetwork_tutorial.lua), it throws the following errors.
Tanny #neuralnetwork_tutorial.lua: About to initiate: datasource = dp.Mnist{input_preprocess = dp.Standardize()}
Tanny #/home/ubuntu/binaries/torches/torch/install/share/lua/5.1/dp/preprocess/standardize.lua: Marked presence!!!
Tanny #/home/ubuntu/binaries/torches/torch/install/share/lua/5.1/dp/torch/File.lua says: #177 typeidx= 3
Tanny #/home/ubuntu/binaries/torches/torch/install/share/lua/5.1/dp/torch/File.lua says: #177 typeidx= 1
Tanny #/home/ubuntu/binaries/torches/torch/install/share/lua/5.1/dp/torch/File.lua says: #177 typeidx= 4
Tanny #/home/ubuntu/binaries/torches/torch/install/share/lua/5.1/dp/torch/File.lua says: #177 typeidx= 0
Tanny #/home/ubuntu/binaries/torches/torch/install/share/lua/5.1/dp/torch/File.lua says: #177 typeidx= 28
Tanny #/home/ubuntu/binaries/torches/torch/install/share/lua/5.1/dp/torch/File.lua says: #259 typeidx= 28
/home/ubuntu/binaries/torches/torch/install/bin/luajit: ...aries/torches/torch/install/share/lua/5.1/torch/File.lua:260: unknown object
stack traceback:
[C]: in function 'error'
...aries/torches/torch/install/share/lua/5.1/torch/File.lua:260: in function 'readObject'
...aries/torches/torch/install/share/lua/5.1/torch/File.lua:252: in function 'readObject'
...aries/torches/torch/install/share/lua/5.1/torch/File.lua:277: in function 'loadData'
...es/torches/torch/install/share/lua/5.1/dp/data/mnist.lua:74: in function 'loadTrainValid'
...es/torches/torch/install/share/lua/5.1/dp/data/mnist.lua:61: in function '__init'
...aries/torches/torch/install/share/lua/5.1/torch/init.lua:50: in function <...aries/torches/torch/install/share/lua/5.1/torch/init.lua:46>
[C]: in function 'Mnist'
neuralnetwork_tutorial.lua:16: in main chunk
[C]: in function 'dofile'
...ches/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x0804d650
I put some print statements in those scripts to understand the flow. I happen to notice that in File.lua the first step after getting the object is to determine the type of the object; of which 8 have been declared. The types have been declared through 0 to 7, 0 being TYPE_NIL. However the code fails, as it detects a type 28(??).
Kindly any help where I am going wrong? Or where to look into to find the issue?
P.S.: The script downloads the data on its own, however due to certain standard corporate proxy setting issues, it could not download. Therefore, I personally downloaded the data MNIST and stored it in the specific data directory. If this could be a clue??
Okay, so it was a bug in the code (serialized MNIST wasn't cross-platform). Fixed by serializing dataset using ascii format instead of binary.

got SIGSEGV while calling 'require ("lsqlite3")' with lua 5.1.5

I had built lua 5.1.5 and lsqlite3-0.8.1. all of them run well on my RedHat Linux.
and then I ported them to my MIPS development board. lua and other modules (such as luafilesystem, md5, cgilua and wsapi) run well. but lsqlite3 does not work.
when I execute require("lsqlite3") in lua command line, it returns error messages in below:
lua
Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio
require("lsqlite3")
do_page_fault() #2: sending SIGSEGV to lua for invalid read access from
00000000 (epc == 00000000, ra == 2ac36144)
Segmentation fault
can any one give me any help to fix it? Thanks!
I got few progress in solving this problem, I rebuilt the LUA with gcc compile option '-Wl,-E' and rebuilt lsqlite3 later. I executed require ("lsqlite3") in lua command line, and it didnt print any message. I continued running some other database operation commands and found them all been successfully executed. As it seemed the problem had been solved, I should be very happy about it.
but another more strange problem raised.
If I put sentence require("lsqlite3") into a file, and then execute the file in this way:
lua file
it still printed error messages like this:
do_page_fault() #2: sending SIGSEGV to lua for invalid read access from
2ada054c (epc == 2ada054c, ra == 2abdceac)
If I put more database operation sentences into a file, and then run this file by lua. Lua can give correct result of query operation and insert values to table correctly, but always print error messages showed above.
If I run sentences in the file one by one in lua command line interface, it never print this error message.
It seems to give the error message when executing the 'require' function. But if I put require("lfs") into a file and run this file by lua, it never print error message.
I am confused that whait is the difference between the lua command line execution and lua script.
There are three places in lsqlite3.c where sqlite_int64 is used (never long long directly). When you build sqlite3 some type will be used for 64 bit integers; lsqlite3 will use the same type by including sqlite3.h for the definition of the type.

Resources