What are the rules for file extensions in Windows and Unix? - parsing

i'm currently using File::Basename fileparse to separate out a file's directory, base file name and it's extension using something like this:
my($myfile_name,$mydirectory, $file_extension) = fileparse($$rhash_params{'storage_full_path_location'},'\..{1,4}');
But see that there's a variation where you can actually provide a array of suffixes to the function, the array would contains all the known file extension.
So i'm trying to find a safe way to do this as i've seen that i've got some strange file names to process, i.e. file.0f1.htm, etc.
Question:
Is there a list of commonly used
extension for Windows and Unix
systems? But in my case it's mainly
for Windows.
And is it safe to
assume that all file names in
Windows should have an extension
ending with three letter characters?
And if there's an even better way to do this, please share.
Thanks.
Updates:
So obviously i must be drunk to forgot about those other extension. :)
Thus i've updated the current regex to allow from 1-4chars.
In this case, how should i change my regex line to properly match it?
Or is it an even better idea to look for all those commonly used extension from google and put them into an array to be passed to the function instead? My users are usually either students or teachers.

1. Is there a list of commonly used extension for Windows and Unix
systems? But in my case it's mainly
for Windows.
Yes, loads, all over the internet: http://www.google.com/search?q=common+file+extensions
2. And is it safe to assume that all file names in Windows should have
an extension ending with three letter
characters?
No, it's perfectly possible to use '.c', '.java', etc in Windows.

There are several fault assumptions in your code:
files need not have extensions. For example most binary executables on Unix/Linux/... don't have an extension at all. They are simply calls "bash", "wget", "sed", "Xorg", ...
extensions need not be three characters long, as #Alnitak already told you: ".c", ".java", ".mpeg", ".jpeg", ".html" are all perfectly fine and rather wide-spread extensions
cutting at the last "." is probably saver, but can still fail for files with no extensions or with multiple (or multi-part) extensions such as ".tar.gz", "tar.bz2", which occur rather often in the Unix/Linux/...-World

Related

Microsoft Visual Studio extension (VSIX) lower case $safeprojectname$

Context
I'm developing a Microsoft Visual Studio extension, for which I've seen there are:
$projectname$ variable to get the name given to the project,
$safeprojectname$ variable to get the name given to the project with all unsafe characters and spaces replaced by underscore.
Source: https://learn.microsoft.com/en-us/visualstudio/ide/template-parameters?view=vs-2019
For example with project name "Tata yoyo" variables will be:
$projectname$ = "Tata Yoyo SWIG",
$safeprojectname$ = "Tata_Yoyo_SWIG".
The extension I'm building is for SWIG projects that will generate Java from C++, and in this context there is a swig.exe call that, among others, takes the Java package as parameter, for which I want it to be all lower case, but for now it is com.company.$safeprojectname$, then, not necessarily lower case (pointing the obvious: if project name is not lower case, package will not be lower case) and I then have to convert it manually to lower case.
What I'm looking for
From source page above (and other documentation pages) I've already seen there is no $lowercasesafeprojectname$ for example, then if anybody knows a way to do it from a function, script or any other way I would be glad.
Edit: while I want for this purpose a lower case safe project name I still want to keep the original $safeprojectname$, then even if #Ed Dore answer is relevant it is not the solution for me.
In any case, do not hesitate if this is not clear or you want more information.
Thanks
If you implement a custom wizard (IWizard) with your template, you can replace the respective token values in the ReplacementsDictionary passed to your IWizard.RunStarted method, with lowercased equivalents.
Sincerely,

What is the recommended way to make & load a library?

I want to make a small "library" to be used by my future maxima scripts, but I am not quite sure on how to proceed (I use wxMaxima). Maxima's documentation covers the save(), load() and loadFile() functions, yet does not provide examples. Therefore, I am not sure whether I am using the proper/best way or not. My current solution, which is based on this post, stores my library in the *.lisp format.
As a simple example, let's say that my library defines the cosSin(x) function. I open a new session and define this function as
(%i0) cosSin(x) := cos(x) * sin(x);
I then save it to a lisp file located in the /tmp/ directory.
(%i1) save("/tmp/lib.lisp");
I then open a new instance of maxima and load the library
(%i0) loadfile("/tmp/lib.lisp");
The cosSin(x) is now defined and can be called
(%i1) cosSin(%pi/4)
(%o1) 1/2
However, I noticed that a substantial number of the libraries shipped with maxima are of *.mac format: the /usr/share/maxima/5.37.2/share/ directory contains 428 *.mac files and 516 *.lisp files. Is it a better format? How would I generate such files?
More generally, what are the different ways a library can be saved and loaded? What is the recommended approach?
Usually people put the functions they need in a file name something.mac and then load("something.mac"); loads the functions into Maxima.
A file can contain any number of functions. A file can load other files, so if you have somethingA.mac and somethingB.mac, then you can have another file that just says load("somethingA.mac"); load("somethingB.mac");.
One can also create Lisp files and load them too, but it is not required to write functions in Lisp.
Unless you are specifically interested in writing Lisp functions, my advice is to write your functions in the Maxima language and put them in a file, using an ordinary text editor. Also, I recommend that you don't use save to save the functions to a file as Lisp code; just type the functions into a file, as Maxima code, with a plain text editor.
Take a look at the files in share to get a feeling for how other people have gone about it. I am looking right now at share/contrib/ggf.mac and I see it has a lengthy comment header describing its purpose -- such comments are always a good idea.
For principiants, like me,
Menu Edit:configure:Startup commands
Copy all the functions you have verified in the first box (this will write your wxmaxima-init.mac in the location indicated below)
Restart Wxmaxima.
Now you can access the functions whitout any load() command

Options for MeCab Japanese tokenizer on iOS?

I'm using the iPhone library for MeCab found at https://github.com/FLCLjp/iPhone-libmecab . I'm having some trouble getting it to tokenize all possible words. Specifically, I cannot tokenize "吉本興業" into two pieces "吉本" and "興業". Are there any options that I could use to fix this? The iPhone library does not expose anything, but it uses C++ underneath the objective-c wrapper. I assume there must be some sort of setting I could change to give more fine-grained control, but I have no idea where to start.
By the way, if anyone wants to tag this 'mecab' that would probably be appropriate. I'm not allowed to create new tags yet.
UPDATE: The iOS library is calling mecab_sparse_tonode2() defined in libmecab.cpp. If anyone could point me to some English documentation on that file it might be enough.
There is nothing iOS-specific in this. The dictionary you are using with mecab (probably ipadic) contains an entry for the company name 吉本興業. Although both parts of the name are listed as separate nouns as well, mecab has a strong preference to tag the compound name as one word.
Mecab lacks a feature that allows the user to choose whether or not compounds should be split into parts. Note that such a feature is generally hard to implement because not everyone agrees on which compounds can be split and which ones can't. E.g. is 容疑者 a compound made up of 容疑 and 者? From a purely morphological point of view perhaps yes, but for most practical applications probably no.
If you have a list of compounds you'd like to get segmented, a quick fix is to create a user dictionary for the parts they consist of, and make mecab use this in addition to the main dictionary.
There is Japanese documentation on how to do this here. For your particular example, it would involve the steps below.
Make a user dictionary with two entries, one for 吉本 and one for 興業:
吉本,,,100,名詞,固有名詞,人名,名,*,*,よしもと,ヨシモト,ヨシモト
興業,,,100,名詞,一般,*,*,*,*,こうぎょう,コウギョウ,コウギョウ
I suspect that both entries exist in the default dictionary already, but by adding them to a user dictionary and specifying a relatively low specificness indicator (I've used 100 for both -- the lower, the more likely to be split), you can get mecab to tend to prefer the parts over the whole.
Compile the user dictionary:
$> $MECAB/libexec/mecab/mecab-dict-index -d /usr/lib64/mecab/dic/ipadic -u mydic.dic -f utf-8 -t utf-8 ./mydic
You may have to adjust the command. The above assumes:
Mecab was installed from source in $MECAB. If you use mecab installed by a package manager, you might have difficulties finding the mecab-dict-index tool. Best install from source.
The default dictionary is in /usr/lib64/mecab/dict/ipadic. This is not part of the mecab package; it comes as a separate package (e.g. this) and you may have difficulties finding this, too.
mydic is the name of the user dictionary created in step 1. mydic.dic is the name of the compiled dictionary you'll get as output (needs not exist).
Both the system dictionary (-t option) and the user dictionary (-f option) are encoded in UTF-8. This may be wrong, in which case you'll get an error message later when you use mecab.
Modify the mecab configuration. In a system-wide installation, this is a file named /usr/lib64/mecab/dic/ipadic/dicrc or similar. In your case it may be located somewhere else. Add the following line to the end of the configuration file:
userdic = home/myhome/mydic.dic
Make sure the absolute path to the dictionary compiled above is correct.
If you then run mecab against your input, it will split the compound into its parts (I tested it, using mecab 0.994 on a Linux system).
A more thorough fix would be to get the source of the default dictionary and manually remove all compoun nouns you want to get split, then recompile the dictionary. As a general remark, using a CJK tokenizer for a serious application in production mode over a longer period of time usually involves a certain amount of dictionary maintenance (adding/removing entries) regularly.

XCode search: How create a Matches Regex (regular expression) to select the correct files

I have a projec with a lot of target in XCode. I want to customize my find and select only files of one target.
My program have files code in this way:
Global files that all targets use
Global for devices (iPad or iPhone)
Specific for that target
So, for case 1, the file names is in this way:
SCName1.m
SCOtherGlobal.m
Case 2 is:
SCiPhoneFile1.m
SCiPhoneFile2.m
SCiPadFile.m
SCiPadFile2.m
Case 3:
SCiPhoneAAFile1.m
SCiPadABFile1.m
SCiPadABOtherFile.m
So, for one target, I want to create a regular expression that search files with:
SC* OR SCiPhone* OR SCiPhoneAA*
The more complete way is
(SC* AND NOT(SCiPhone*) AND NOT(SCiPad*)) OR
(SCiPhone* AND NOT(SCiPad*) AND NOT(SCiPhoneAB*)) OR
(SCiPhoneAA*)
I am newbie in regular expression and my expression isn't working. My logic is correct? Someone know create the correct regular expression?
Since your desire is to match file names that have common elements, I think the most efficient way (if not the only way) to do this is to use exclusion (that is, files that do not pertain to the iPad in any way, or files that do not pertain to the iPhoneAB device, but any others are fine.
So, for these inputs:
SCName1.m
SCOtherGlobal.m
SCiPhoneFile1.m
SCiPhoneFile2.m
SCiPadFile.m
SCiPadFile2.m
SCiPhoneAAFile1.m
SCiPhoneABFile2.m
SCiPadABFile1.m
SCiPadABOtherFile.m
I can match these files:
SCName1.m
SCOtherGlobal.m
SCiPhoneFile1.m
SCiPhoneFile2.m
SCiPhoneAAFile1.m
Using this expression (which uses negative look-ahead):
SC(?!iPad|iPhoneAB)[^.]*\.m
I hope this works for you! Also, I can break this down for you, if you'd like :D

Erlang: "extending" an existing module with new functions

I'm currently writing some functions that are related to lists that I could possibly be reused.
My question is:
Are there any conventions or best practices for organizing such functions?
To frame this question, I would ideally like to "extend" the existing lists module such that I'm calling my new function the following way: lists:my_funcion(). At the moment I have lists_extensions:my_function(). Is there anyway to do this?
I read about erlang packages and that they are essentially namespaces in Erlang. Is it possible to define a new namespace for Lists with new Lists functions?
Note that I'm not looking to fork and change the standard lists module, but to find a way to define new functions in a new module also called Lists, but avoid the consequent naming collisions by using some kind namespacing scheme.
Any advice or references would be appreciated.
Cheers.
To frame this question, I would ideally like to "extend" the existing lists module such that I'm calling my new function the following way: lists:my_funcion(). At the moment I have lists_extensions:my_function(). Is there anyway to do this?
No, so far as I know.
I read about erlang packages and that they are essentially namespaces in Erlang. Is it possible to define a new namespace for Lists with new Lists functions?
They are experimental and not generally used. You could have a module called lists in a different namespace, but you would have trouble calling functions from the standard module in this namespace.
I give you reasons why not to use lists:your_function() and instead use lists_extension:your_function():
Generally, the Erlang/OTP Design Guidelines state that each "Application" -- libraries are also an application -- contains modules. Now you can ask the system what application did introduce a specific module? This system would break when modules are fragmented.
However, I do understand why you would want a lists:your_function/N:
It's easier to use for the author of your_function, because he needs the your_function(...) a lot when working with []. When another Erlang programmer -- who knows the stdlb -- reads this code, he will not know what it does. This is confusing.
It looks more concise than lists_extension:your_function/N. That's a matter of taste.
I think this method would work on any distro:
You can make an application that automatically rewrites the core erlang modules of whichever distribution is running. Append your custom functions to the core modules and recompile them before compiling and running your own application that calls the custom functions. This doesn't require a custom distribution. Just some careful planning and use of the file tools and BIFs for compiling and loading.
* You want to make sure you don't append your functions every time. Once you rewrite the file, it will be permanent unless the user replaces the file later. Could use a check with module_info to confirm of your custom functions exist to decide if you need to run the extension writer.
Pseudo Example:
lists_funs() -> ["myFun() -> <<"things to do">>."].
extend_lists() ->
{ok, Io} = file:open(?LISTS_MODULE_PATH, [append]),
lists:foreach(fun(Fun) -> io:format(Io,"~s~n",[Fun]) end, lists_funs()),
file:close(Io),
c(?LISTS_MODULE_PATH).
* You may want to keep copies of the original modules to restore if the compiler fails that way you don't have to do anything heavy if you make a mistake in your list of functions and also use as source anytime you want to rewrite the module to extend it with more functions.
* You could use a list_extension module to keep all of the logic for your functions and just pass the functions to list in this function using funName(Args) -> lists_extension:funName(Args).
* You could also make an override system that searches for existing functions and rewrites them in a similar way but it is more complicated.
I'm sure there are plenty of ways to improve and optimize this method. I use something similar to update some of my own modules at runtime, so I don't see any reason it wouldn't work on core modules also.
i guess what you want to do is to have some of your functions accessible from the lists module. It is good that you would want to convert commonly used code into a library.
one way to do this is to test your functions well, and if their are fine, you copy the functions, paste them in the lists.erl module (WARNING: Ensure you do not overwrite existing functions, just paste at the end of the file). this file can be found in the path $ERLANG_INSTALLATION_FOLDER/lib/stdlib-{$VERSION}/src/lists.erl. Make sure that you add your functions among those exported in the lists module (in the -export([your_function/1,.....])), to make them accessible from other modules. Save the file.
Once you have done this, we need to recompile the lists module. You could use an EmakeFile. The contents of this file would be as follows:
{"src/*", [verbose,report,strict_record_tests,warn_obsolete_guard,{outdir, "ebin"}]}.
Copy that text into a file called EmakeFile. Put this file in the path: $ERLANG_INSTALLATION_FOLDER/lib/stdlib-{$VERSION}/EmakeFile.
Once this is done, go and open an erlang shell and let its pwd(), the current working directory be the path in which the EmakeFile is, i.e. $ERLANG_INSTALLATION_FOLDER/lib/stdlib-{$VERSION}/.
Call the function: make:all() in the shell and you will see that the module lists is recompiled. Close the shell.
Once you open a new erlang shell, and assuming you exported you functions in the lists module, they will be running the way you want, right in the lists module.
Erlang being open source allows us to add functionality, recompile and reload the libraries. This should do what you want, success.

Resources