Im looking for a similar function to run spaCy offline like this one from huggingFaceexample
basiclly, i want to use local files only and not try to look things up whan using a model.
So when i use a model it will not change even if there is an update avilable and i will use the model i have locally.
i have tried using nlp.to_disk("./en_example_pipeline") and then using spacy.load("./en_example_pipeline") but I'm not sure that this method will not update the model if one is avilable. the docs are not clear.
spaCy does not automatically update or download models on its own.
For pretrained pipelines provided by spaCy, they are only downloaded if you use the spacy download command or related functions.
In the specific case of spacy-transformers, which uses HuggingFace Transformers internally, if you specify a model and it's not present already, it will be downloaded. When a model is loaded it may be checked for updates, it depends on the HuggingFace library and the model implementation.
Models are never automatically checked for updates by spaCy code, but
models can contain arbitrary code, so you may need to be careful about that. spaCy's pretrained pipelines are not automatically checked for updates.
If you are using a pipeline you created yourself, like with nlp.to_disk, spaCy won't access the Internet (unless one of your own components is doing that).
Related
For a personal work I try to automate the workflow of my machine learning model but I face some question in the perspective of a professional approach.
At the moment I am doing the following tasks manually:
From the raw data I extract the data that interests me in a directory with the help of a third party software (to which I give in argument the parameters of the extraction).
Then I run another software, or in some cases one (or more) of my scripts (python) in order to pre-process my data which will be stored in a new directory.
Finally I provide the processed data to one of my model which returns the labeled data and that I store in a last directory.
process diagram of the previous description.
Each step (extract, pre-process and model) are always executed in the same order but I change the scripts/software parameters/model according to my needs or the comparison I need to do.
All my scripts are stored in an ordered script directory and the third party software is called from the command line from a python script.
My goal would be to have a script/software that does the whole loop by itself. As input it would take the raw data (or the directory where they are stored) and the different parameters to make the loop with the desired module (and their right parameters).
The number of module and parameter combinations is so big that I can't make a script for each one, that's why I want to build something very modular.
I can code myself my own script but I would like to have a more professional approach as if I had to implement it for a company.
My questions: In my case (customizable/interchangeable module) would it be more appropriate to use a framework (e.g. Kedro or any other) or to build it myself (because my needs are too specific)? If frameworks are appropriate which ones to choose (and why) ?
I've been researching frameworks that already exist but besides the fact that I'm not sure if they fit my needs there are so many that I'd like to spend some time on one that could help me in my future project or professional experience.
thanks you
I have a 3D model which I'm setting up for AR using Reality Composer on a Mac. I've assigned a few basic behaviours to the model. For example, making certain objects spin when tapped.
The problem: I will need to make changes to the model based on client requests. When I make the changes and import the new model into Reality Composer the behaviours seem to be "unlinked". I'm hoping there's a way to import an updated model without having to set up the behaviours every time.
I'm new to Reality Composer so I might be missing something obvious.
I tried using the built-in "replace" option in the right-click menu. This does replace the model, but not without breaking the behaviours.
I have a stream of user-item pairs, hold a block based on last 6M records and update it each minute. I don't like that between these rebuilds some important data might be unused. For example new user has joined the system, but the model doesn't know about him yet. I've found class PlusAnonymousConcurrentUserDataModel, which allows to add few entries to the model and get more accurate recommendation. Documentation proposes more constrained usage scenario for it yet: I have to:
allocate temporary user
add extra data
get recommendation
and then release user and extra data
Is it ok to use this class for collecting data iteratively till model is actually rebuilt by timer? What is the right way to do this? It seems that PlusAnonymousConcurrentUserDataModel is a bit for different purposes.
This part of Mahout is very old an being deprecated. I think it is not even in the 0.14.0 build, you would have to build from source.
Mahout now uses a whole new technology for recommending. The new algorithm is called Correlated Cross-Ocurrence (CCO). The old method you are using does not make use of real time input as you have outlined. CCO can recommend to anonymous users that have not been built into the model as long as there is behavioral data for them in some form.
The architecture to implement CCO requires a datastore in a DB and a KNN engine (search engine) to make model queries. These are all packaged together in Apache PredictionIO + the Universal Recommender template.
Community support for the Universal Recommender itself can be found here: https://groups.google.com/forum/#!forum/actionml-user or on the mailing lists of the other projects.
Will there be an equivelent of the c# Reflection.Emit namespace in dart?
Reflection.Emit has a number of classes that are used to build types at run time and adding properties, configering their getters and setter and building methods and event handlers all at run time, which is really powerfull when it comes to metaprogramming.
my idea is about generating my data models at run time and caching them in a map so i can create instances at run time and add new methods and properties to them when i need to and not having to use mirrors often after generating the class, this could be really useful when writing ORMs and more dynamic applications where you use reflection once rather than using it every time you need to modify an instance
My questions are:
Will there be such thing in the future versions of dart? they mention
something about a Mirror Builder but i am not sure if does the same
thing, can some one please confirm if thats what a Mirror Builder is
about?
another question is, if i am able to generate my data types on the
server as strings, is there a way to to compile them before sending
them to the client and map them in a Map and use this Map to create instances?
I have seen discussions that this should be supported at some time but as far as I know will not be started to work on in the near future.
Similar requirements are usually solved by code generation at build time (Polymer, Angular, others) by transformers which analyze the code and generated code for reflective property access or code snippets in HTML.
Smoke is a package that aims to simplify this.
Code generation has the advantage that the amount of code needed to be downloaded by the client is much smaller.
When you do code generation at runtime you need a compiler and that is a lot of code that needs to be downloaded into the browser.
try.dartlang.org takes a such an approach. The source is available here https://code.google.com/p/dart/source/browse/branches/bleeding_edge/dart/site/try/ .
It includes dart2js (built to JavaScript) and runs a background isolate that compiles the Dart code to JS.
I'm in the process of a manual core data migration and keep running into Cocoa Error 134140: NSMigrationMissingMappingModelError. I've noticed this happens any time I make any change to the model, even something as small as marking a property as optional. So far, the only solution I've found when this happens is to delete my mapping model and create a new mapping model. Are there any better, less tedious solutions?
There's a menu option to resolve this. If you update your model anytime after creating your mapping model just do the following:
Select the mapping model.
Choose Editor -> Refresh Data Models.
This happens because:
The migration map identifies the model files by the entity hashes, and
When you change an entity, you change its hash.
When you change the model, the map no longer matches it, and migration fails because no matching map can be found.
The workaround is to not mess with migration until you've nailed down what the new model looks like. Then create the map with the final version of the model. If you can't finalize the new model and need to work on migration, you've already discovered the necessary procedure.
Tom is correct but I would take it one further. I would not do a manual/heavy migration, ever. If it cannot be done in a lightweight migration consider doing an export/import. It will be faster and more performant than a heavy migration.
My standard recommendation is to keep your changes small enough so that you can always do a lightweight migration.
Update on Import/Export
Heavyweight migration is a hold-over from OS X where memory was cheap. It should not be used in iOS. So what is the right answer?
My recommendation to people is to handle it on your own. Lightweight migration if at all possible, even if it requires walking through several models to get from A to B. However in your case that does not sound possible.
So the second option is export/import. It is very easy to export Core Data out to JSON. I even did a quick example in a Stack Overflow post about it.
First, you stand up the old model and the current store. This involves finding the right model version and manually loading it using [[NSManagedObjectModel alloc] initWithContentsofURL:] and pointing to the right model version. There are details on how to find the right mold version in my book (grin).
Then export the current model out to JSON. That should be fairly quick. However, don't do this in your -applicationDidFinish.. for obvious reasons.
Step two is to load up the new Core Data stack with the "current" model and import that JSON. Since that JSOn is in a known format you can import it fairly easily.
This will allow you to control the entire experience and avoid the issues that heavy migration has.