How to bind Lua with Mecab? - lua

I want to use Mecab in Lua,but I really can't figure out the procedure of bindings,i am work no windows 7,is bingdings meaning that to create a shared library? if so? how to?i see some binding files about Java,the files in package org.chasen.mecab shows that it create by swig,it make me confused.so where does it derive from ? or just write by ourself?after bindings,what else should i do if i want to use in Lua,by the way I use mingw.can someone give some simple steps to me that I can keep trying to work on it.
to greatwolf:
i use followings command to do it
swig -lua -c++ MeCab.i
g++ -c MeCab_wrap.cxx -I C:\Lua\5.1\include -I ..\src
g++ -LC:\Lua\5.1\lib -shared MeCab_wrap.o -llua51 -o MeCab.dll
and i got errors below.
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x2960): undefined reference to `_imp___ZN5MeCab12createTaggerEPKc'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x2981): undefined reference to `_imp___ZN5MeCab12getLastErrorEv'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x29c9): undefined reference to `_imp___ZN5MeCab12createTaggerEPKc'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x29d9): undefined reference to `_imp___ZN5MeCab12getLastErrorEv'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x2a8d): undefined reference to `_imp___ZN5MeCab11createModelEPKc'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x2aae): undefined reference to `_imp___ZN5MeCab12getLastErrorEv'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x2af6): undefined reference to `_imp___ZN5MeCab11createModelEPKc'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x2b06): undefined reference to `_imp___ZN5MeCab12getLastErrorEv'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x2b6d): undefined reference to `_imp___ZN5MeCab13createLatticeEv'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x11050): undefined reference to `MeCab::Model::version()'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x11457): undefined reference to `MeCab::Model::create(int, char**)'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x11732): undefined reference to `MeCab::Model::create(char const*)'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x122bf): undefined reference to `MeCab::Tagger::parse(MeCab::Model const&, MeCab::Lattice*)'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x17737): undefined reference to `MeCab::Tagger::create(int, char**)'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x17a12): undefined reference to `MeCab::Tagger::create(char const*)'
MeCab_wrap.o:MeCab_wrap.cxx:(.text+0x17d83): undefined reference to `MeCab::Tagger::version()'
collect2: ld returned 1 exit status
it seems that these function declared in mecab.h,but i don't know how to handle it.

It looks like mecab provides a swig file for automatically generated bindings. As luck would have it, lua is one of swig's supported targets.
A reasonable starting point would be to check out the corresponding makefile to see how bindings get created for other languages. From mecab's swig makefile:
SWIG = swig
PREFIX = MeCab
all: perl ruby python java csharp
# ...
perl:
$(SWIG) -perl -shadow -c++ $(PREFIX).i
mv -f $(PREFIX)_wrap.cxx ../perl
mv -f $(PREFIX).pm ../perl
ruby:
$(SWIG) -ruby -c++ $(PREFIX).i
mv -f $(PREFIX)_wrap.cxx ../ruby/$(PREFIX)_wrap.cpp
python:
$(SWIG) -python -shadow -c++ $(PREFIX).i
mv -f $(PREFIX)_wrap.cxx ../python
mv -f $(PREFIX).py ../python
# ...
Extrapolating from the above, you can try generating swig bindings something like the following:
swig -lua -shadow -c++ MeCab.i
This will substantially reduce the effort you would otherwise need from manually creating the bindings yourself.

I don't know of any Lua bindings for mecab. Try googling for them first.
To create the bindings yourself you must be proficient both with Lua C API and C (or C++). A deep understanding of Lua itself (the language, I mean) is advisable.
Search the Lua WIKI for some more pointers. In particular BindingCodeToLua page.

I recently needed this and since I didn't find anything I wrote a module so you can use Mecab with Lua.
It works like this:
mecab = require "mecab"
parser = mecab:new("") -- you can pass mecab config options here, like "-Owakati"
print(parser:parse("吾輩は猫である"))
You can also install it via LuaRocks as mecab.
It just provides access to the parse method of the Tagger class, but when working with Mecab that's all I've ever needed. If you'd like support for other Mecab features please feel free to file an issue on Github.

Related

Using Clang with built libstdc++ produces undefined symbol _ZSt15__once_callable

I have built libstdc++ with no modifications yet:
cd gccsrcdir/libstdc++-v3/build
../configure --prefix=$PWD/../install
make && make install
I am using Ubuntu 21.10 and I set the following environment variables:
export LIBRARY_PATH=gccsrcdir/libstdc++-v3/install/lib
export LD_LIBRARY_PATH=gccsrcdir/libstdc++-v3/install/lib
export CPLUS_INCLUDE_PATH=gccsrcdir/libstdc++-v3/install/include/c++/13.0.0
When I then use the system's GCC, I get no problems. When I use the system's Clang, it produces a symbol lookup error - even with no parameters:
clang++
clang++: symbol lookup error: /lib/x86_64-linux-gnu/libicuuc.so.67: undefined symbol: _ZSt15__once_callable, version GLIBCXX_3.4.11
In fact I only need to update LD_LIBRARY_PATH to arrive here. What am I doing wrong?
The symbol -- std::__once_callable is defined in your system libstdc++.so.6 (it has version GLIBCXX_3.4.11 in my build, which means it was added in GCC-4.4.0).
Your build of libstdc++.so.6 should define this symbol as well, but for some reason does not. That is a problem -- any binary which uses this symbol will fail at runtime when using your build of libstdc++.so.6 (which is happening because you've pointed LD_LIBRARY_PATH to it).
Note: in your case it's the clang++ binary that is failing to run -- any flags you add to it (such as -femulated-tls) are irrelevant -- they only affect the binary that would have been generated IF clang++ itself didn't fail.
I just repeated your configure && make steps, and the library built this way also doesn't define this symbol.
I then repeated the configure && make, but starting from top-level GCC directory, and libstdc++.so.6 built that way does define the symbol.
Conclusion: libstdc++ is configured differently during "normal" GCC build.
The definition comes from mutex.o, which is built from ./libstdc++-v3/src/c++11/mutex.cc, and which has this chunk of code:
#ifdef _GLIBCXX_HAS_GTHREADS
namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
#ifdef _GLIBCXX_HAVE_TLS
__thread void* __once_callable;
__thread void (*__once_call)();
...
So it sounds like either _GLIBCXX_HAS_GTHREADS or _GLIBCXX_HAVE_TLS is not defined when doing configure && make in the libstdc++-v3 directly.
Digging further, I see that libstdc++-v3 determines _GLIBCXX_HAS_GTHREADS by trying to compile #include "gthr.h", and that file is available in libgcc/gthr.h, but not in "standard" installed GCC.
../libstdc++-v3/configure && grep _GLIBCXX_HAS_GTHREADS config.h
/* #undef _GLIBCXX_HAS_GTHREADS */
TL;DR: correctly configuring libstdc++.so is complicated, and you will be better off building complete GCC.
Once you have a complete build, you will have a libstdc++-v3 directory properly configured, and can just rebuilt in that directory:
grep _GLIBCXX_HAS_GTHREADS ./x86_64-pc-linux-gnu/libstdc++-v3/config.h
#define _GLIBCXX_HAS_GTHREADS 1

stat method not found in libc.so.6

Using Dart FFI I'm trying to dynamically load the linux/posix 'stat' function.
I've assumed that the function is in the libc.so.6 library but when I attempt to load it I get the error:
Invalid argument(s): Failed to lookup symbol (/lib/x86_64-linux-gnu/libc.so.6: undefined symbol: stat)
I'm successfully loading other functions from the libc.so.6 library so my dynamic loading technique is working correctly.
I have two theories:
stat is a macro for xstat and as such stat no longer exists.
stat is in another library that I've not been able to groc.
Ideally I want to use stat rather than xstat as I need this code to also work on osx which as far as I can tell doesn't support xstat.
Help?
I have two theories:
There is no need to theorize: you can just look:
echo "#include <sys/stat.h>" | gcc -xc - -E -dD | less
nm -AD /lib/x86_64-linux/gnu/*.so* | grep ' stat$'
will tell you everything you need to know (your first theory is correct).
I want to use stat rather than xstat
You can't: it doesn't exist (when using GLIBC).
I need this code to also work on osx which as far as I can tell doesn't support xstat.
Your code can detect the platform it's running on and adjust. This is the price of using non-portable mechanisms, such as FFI.

clang-3.8 and compiler-rt vs libgcc

I have been using clang-3.5 to happily build bitcode versions of musl libc and
use the result to produce nice stand alone executables.
Recent attempts with clang-3.8 have not been so happy. It seems that
the bitcode clang-3.8 generates uses functions defined in
compiler-rt/lib/builtins
Typical examples of functions I find polluting the bitcode are mulxc3, mulsc3, and muldc3. I can solve this by linking against libgcc, or even the llvm alternative if I had any clear idea of what that was. Though I would rather prevent the problem from happening in the first place.
I have seen mention of flags like rtlib=compiler-rt etc, but have found precious little documentation on the subject.
So here are some simple questions.
Is it possible to prevent clang from using the compiler-rt/lib/builtins
in the emitted bitcode? Or if not
Does llvm produce a version of libgcc that I could use. Actually I would
probably build a bitcode version of it, but that is besides the point.
Love to hear some guidance on this.
Added 12/8/2016: So I will illustrate my issues with a particular workflow that
people can reproduce if they wish, or, more likely, just point out where I am being stupid.
So start by checking out:
musllv
and follow the instructions in the README.to compile (here I am using clang-3.8 on ubuntu 14.04)
WLLVM_CONFIGURE_ONLY=1 CC=wllvm ./configure --target=LLVM --build=LLVM
make
cd lib
extract-bc -b libc.a
you will also need the bitcode of a simple executable. I will use nweb.c here.
wllvm nweb.c -o nweb
extract-bc nweb
Now we can do things like:
clang -static -nostdlib nweb.bc libc.a.bc crt1.o libc.a -o nweb
This workflow goes smoothly for clang-3.5 but for clang-3.8 we get:
clang -static -nostdlib nweb.bc libc.a.bc crt1.o libc.a -o nweb
/tmp/libc-f734a3.o: In function `cpowl':
libc.a.bc:(.text+0xbb9a): undefined reference to `__mulxc3'
/tmp/libc-f734a3.o: In function `cpowf':
libc.a.bc:(.text+0x38f7d): undefined reference to `__mulsc3'
/tmp/libc-f734a3.o: In function `csqrt':
libc.a.bc:(.text+0x78fc3): undefined reference to `__muldc3'
/tmp/libc-f734a3.o: In function `cpow':
libc.a.bc:(.text+0xafafc): undefined reference to `__muldc3'
clang-3.8: error: linker command failed with exit code 1 (use -v to seeinvocation)
So as #paul-brannan points out we could try
clang -static -nostdlib --rtlib=compiler-rt nweb.bc libc.a.bc crt1.o libc.a -o nweb
But this is where I am probably being stupid, because I get:
clang-3.8: warning: argument unused during compilation: '--rtlib=compiler-rt'
irregardless of whether I use it as a linking or compiling flag.
OK so I finally managed to make headway on this. I built llvm-3.8.1 together with the compiler-rt project using wllvm and wllvm++.
One of the build products was libclang_rt.builtins-x86_64.a,
and from this archive I was able to extract the bitcode module
libclang_rt.builtins-x86_64.bc
using the command:
extract-bc -b libclang_rt.builtins-x86_64.a
This bitcode module has definitions for those pesky instrinsics like
__mulxc3, __mulsc3, and __muldc3.
Hallelujah!

How to add directories to ld search path for a cross-compilation to ARM?

I am trying to configure util-linux to cross compile using arm-none-linux-gnueabi from CodeSourcery. My only problem so far is that it can't find my ncurses library which I compiled.
How can I add a directory to the ld search path? I've tried adding to my LIBRARY_PATH and LD_LIBRARY_PATH variables, but neither does anything. I know that I can add the -L flag to gcc and it will add to the linker path, but is there any way to do this globally, so that I can do it once, and not have to worry about it again?
Here is the output of arm-none-linux-gnueabi-gcc -print-search-dirs | grep libraries | sed 's/:/\n/g':
libraries
=/tools/bin/../lib/gcc/arm-none-linux-gnueabi/4.6.1/
/tools/bin/../lib/gcc/
/tools/bin/../lib/gcc/arm-none-linux-gnueabi/4.6.1/../../../../arm-none-linux-gnueabi/lib/arm-none-linux-gnueabi/4.6.1/
/tools/bin/../lib/gcc/arm-none-linux-gnueabi/4.6.1/../../../../arm-none-linux-gnueabi/lib/
/tools/bin/../arm-none-linux-gnueabi/libc/lib/arm-none-linux-gnueabi/4.6.1/
/tools/bin/../arm-none-linux-gnueabi/libc/lib/
/tools/bin/../arm-none-linux-gnueabi/libc/usr/lib/arm-none-linux-gnueabi/4.6.1/
/tools/bin/../arm-none-linux-gnueabi/libc/usr/lib/
I would like to add /arm/usr/lib and /arm/usr/local/lib to my ld search path.
If you need output from any other commands, just ask!
EDIT: I just found out about the CFLAGS environment variable--do all configure scripts/makefiles honor it?
Thank you!
If the ncurses library you compiled are going to be linked to the ARM binary you are cross-compiling you can not use LD_LIBRARY_PATH! LD_LIBRARY_PATH is only used by the current run-time and is in no way used by the compiler or linker when building your application.
The use of CFLAGS depends on creator of Makefile. CFLAGS are not automatically used even if they are defined as an environment variable. Only tools like the autoconf tools can pick them up from the environment and use them automagically. In the Makefiles find something like:
$(CC) $(CFLAGS) ....
if this fragment exists then the Makefile uses the CFLAGS variable. LDFLAGS is the more appropriate environment variable to use for link-time options.

MinGW gcc error: "undefined reference to `yylloc'"

Where can I find yylloc? I have included libfl.a (-lfl) in gcc command line, added GnuWin32/bin and GnuWin32/lib directories to system variable LIB, searched through all files in GnuWin32 - neither I nor gcc can find it.
It's Bison variable, it cannot be used without Bison-generated .c file.

Resources