Hacker News new | past | comments | ask | show | jobs | submit login
Say “no” to import side‐effects in Python (chrismorgan.info)
148 points by chrismorgan on April 5, 2014 | hide | past | favorite | 85 comments



I do most of my work with Perl, rather than Python, but in Perl there are several kinds of stock import-level side-effects which are actually quite helpful. These are pretty light-weight though. They boil down to:

1. Global lexical changes to Perl. Sometimes this is the whole point of an import (for example Carp::Always, which typically turns any warning or exception string into a full stack dump). I can't imagine doing something like this in Python. However making soemthing like this work properly without breaking too many things requires a heck of a lot of forethought Yes, even Carp::Always may break something.

2. Manipulation of the importing module's symbol table. This is important for lexical extensions to Perl that you don't want to be global (for example Moo, Moose, and PGObject::Util::DBMethod). Among other things this allows MOPs to be added with greater sophistication than the language typically allows. I am not a Python guru but I could imagine metaprogramming side effects to be useful in setting up a consistent and powerful environment.

The problem the author describes is something which is different though, which not only is a side effect issue but also a violation of separation of concerns. There are certain problems you do not want to solve at import time, and connecting with/configuring external components is almost always one of them.

Why? Because integration with external components is almost always something you want the fine-tuning and decision-making to reside with the application developer. That's very different than setting up a consistent lexical programming environment for use (which is what the acceptable side effects do).


In Python, side-effects are simply not acceptable at all. One should instead put the code with the side-effect in a function and call it.

This is the approach taken by gevent, which allows you to replace the entire I/O stack. But importing it does not effect the change; you must explicitly call code to do that. This is done thus:

    from gevent.monkey import patch_all
    patch_all()
c.f. http://www.gevent.org/gevent.monkey.html


In Perl the act of simply importing that patch_all function would be a side-effect, as it's implemented as code in pure perl that changes the caller's symbol table. I suspect it would be fair to call your import up there also a side-effect, but that it slips the mind since it's implemented in the python core. :)


Indeed. Also there are times (Moose for example) where you want to do more than import a function as written. For example Moo and Moose create a custom function "has" at import time that is them attached to the caller's namespace.

The reason for this is that you want to fully attach the function in the space as a native method and this requires some closures to make work sanely. In short you want:

   package foo;
   use Moose;
   has bar => (is => 'rw');
to behave identically as:

   package notfoo;
   use foo;
   foo::has(bar => (is=>'rw'));
If you rely on the caller the second example would add a bar accessor to the caller's namespace (notfoo) rather than to foo. In short you on't want to use "has" in the Moose namespace. You want to add a custom function to the importing namespace.

As I understand it you can't do this in Python not because it is a bad idea (it is a very good idea sometimes) but rather because of limitations in the language (no multiline lambdas, which you need in order to do useful stuff with the symbol table in this regard unless there is an equivalent in terms of defining a method the calling class inside a closure in the imported package). But again, my knowledge of Perl is much better than my knowledge of Python, so I could be wrong.

When you get into this sort of metaprogramming, custom symbol table manipulations are not only helpful but downright necessary. It is what allows you to add a sophisticated MOP when such is not included in core.


Technically, you can do this with Python, but you'll have to use some dark magic to get the importing module namespace, in order to manipulate it. It'll not be pretty.

Instead, in normal Python it is the importing module that must decide who has access to its own namespace, and who must live within a separated one. It has a completely different set of costs and benefits. (I sometimes wonder if Guido used Perl as a counterexample when creating Python - its principles are almost completely opposite.)


    from __future__ import print_function


As I understand it, "from __future__ import ..." statements are actually a special type of statement that doesn't actually import a module at all - they just use similar syntax for compatibility reasons. There is an actual __future__ module, also for compatibility reasons, but importing it has no side-effects.


Future statements do import something as well, when run, but their primary purpose is changing the behaviour of the compiler for the current file. They have no behaviour outside of that file, so that is not a publicly visible side-effect.


Whether they import a module or not is - IMO - irrelevant. I was responding to this claim: In Python, side-effects are simply not acceptable at all. One should instead put the code with the side-effect in a function and call it. `__future__` imports are obviously and purposefully side-effectful, there's no `print_function()`.


That's a side effect, right? Your module after the import is not the same as it was before, or am I missing something?


Future imports are not imports at all (the fact that they use the same syntax as imports is misleading IMO). They are tags that instruct the interpreter to turn on certain features only when interpreting the given file. They must appear before any statements in the file, and they take effect before interpretation of the file starts. You could argue that this constitutes an import side effect, but that's neither here nor there, and not helpful.


It only changes the behaviour for the current file, so it's not a side-effect.


That seems too narrow a view of side effect.

Which of the following are side effects?

1. Initializing a point of sale printer, checking for errors, and raising exceptions if, say, it is out of paper? Let's say this is a cash drawer driver and the cash drawer connects through the printer, and if the printer is out of paper, the drawer won't open properly (this happens btw). I would call this a side effect as well as a separation of concerns violation btw. However it is not likely to be a visible change to other modules.

2. Check for the presence of a binary and if found, cache the path to it, perhaps instantiating another object to do so? Definitely a side effect there, but not publicly visible.

3. Initializing an external library's environment (as happened in this case)? Done wrong it crashes the system but I suspect the segfault was not intended. Again it isnt clear to me you have a publically visible side effect intended.


Technically in Perl, a "use" statement is asking for those manipulations, though. If you want to just load the library but not ask for any of those things, you would require the library, either in or not in a BEGIN block depending on your desires. If, however, merely "require"ing the library performed any of those manipulations, that would be a violation of what I'd expect. (Of course, if the sole purpose of your module is to perform those manipulations, "require" may not do anything useful, but I'd still expect it not to do anything to my local namespace, either.)


On one machine I tried, help('modules') actually worked successfully with no substantial delays or apparent side effects.

On another, it apparently tried to set up an MPI cluster:

  *** The MPI_Init() function was called before MPI_INIT was invoked.
  *** This is disallowed by the MPI standard.
  *** Your MPI job will now abort.
  [hostname:14114] Abort before MPI_INIT completed successfully; not 
  able to guarantee that all other processes were killed!
On a third, I get the following and then it just hangs:

  Python 2.7.6 (default, Mar 22 2014, 15:40:47)
  [GCC 4.8.2] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> help('modules')
  
  Please wait a moment while I gather a list of all available modules...

  /usr/lib/python2.7/dist-packages/gobject/constants.py:24: Warning: g_boxed_type_register_static: assertion 'g_type_from_name (name) == 0' failed
    import gobject._gobject
  /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: cannot register existing type 'GtkWidget'
    g_type = info.get_g_type()
  /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: cannot add class private field to invalid type '<invalid>'
    g_type = info.get_g_type()
  /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: cannot add private field to invalid (non-instantiatable) type '<invalid>'
    g_type = info.get_g_type()
  /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: g_type_add_interface_static: assertion 'G_TYPE_IS_INSTANTIATABLE (instance_type)' failed
    g_type = info.get_g_type()
  /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: cannot register existing type 'GtkBuildable'
    g_type = info.get_g_type()
  /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: g_type_interface_add_prerequisite: assertion 'G_TYPE_IS_INTERFACE (interface_type)' failed
    g_type = info.get_g_type()
  /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: g_once_init_leave: assertion 'result != 0' failed
    g_type = info.get_g_type()


This in no way invalidates the point of the post, but I can't imagine going back to installing everything globally instead of using virtualenv. If you do that, at least you won't have every package you every looked at in your path.


In my case it is to do with having installed system-wide packages—e.g. I installed Frescobaldi, and that puts in that `frescobaldi_app` which eventually makes it crash. For my own development things I do use virtualenv.

But even inside a virtualenv I have had such a problem before.


I just can't get to use some modules within virtualenv. They seem to create more problems than using systemwide.


Almost all the awkward modules I've come across work when you use --system-site-packages


Which ones? You can get some help. Learning virtualenv is a smart long-term move


This. We've all wrestled with problem modules. Qt comes to mind for me. StackOverflow has always been helpful I've never seriously considered ditching venv.


If you're on a Mac, good luck getting pip to install mysql-python. I use Macports for that.


This isn't a good solution. I pip install mysql-python several times a week on my mac ( I use different venv's for different branches and we have several different services that I work on in a given week, thus I'm installing packages via pip A LOT) and we always install it in a venv. Sure it can be a bit of a pain but it's well worth it IMO.

Our steps to always get it working: make sure mysql_config is in your path env variable, make sure the xcode command line tools are probably installed, and make sure the mysql command line client is setup and working correctly locally. Past that is just works for us on Mac OS X 10.7+. I think 10.6 and earlier also work but I'm not sure.


I couldn't ever get it to compile, but I haven't tried in a while. I don't actually need it much these days as I'm using Postgres for most things.

I'm sure there are other packages that are similar: not so easy to get working with pip but not impossible.


That I can remember, yep, PyQT and PySide, and Psycopg. Psycopg is not a problem on Linux, only Windows.


Of course you should do this in any language. I once used a Ruby library that, when loaded, would try to connect to a database on a remote machine. Programs which required this library would take several seconds to display their --help output.

Because of that and similar incidents, I've learned to import argparse up front but nothing else unless necessary. Once argument parsing is done, then importing other modules begins.


> I once used a Ruby library that, when loaded, would try to connect to a database on a remote machine.

How is that at all acceptable? I can't believe that a library that phones home would gain any sort of popularity.

I can't say I know much of anything about the Ruby community, but if they've conditioned you to jump through hoops like importing modules at specific times to avoid delays, that is a serious problem. Conditional/delayed imports have their place, but they should be relatively rare.


Well, the "home" it was phoning was on the same network, so it wasn't a security issue, just a usability/performance one. This wasn't anything against the Ruby community--just an errant library written by one person who thought it would be more "convenient" if the database connection were established up front.


install my module! Also.. trust me because it's totally safe.

curl http://foo.com/tBrwn | bash

http://rvm.io/rvm/install


Out of curiosity, what would be better to handle the case of RVM?

It's explicitly user space software, and unprivileged user space software at that, so having it require admin interaction (i.e. touching the system package manager) to install seems like a sledgehammer where a flyswatter would do.

They could use a VCS repo of some kind, but that doesn't handle the various folder and script installs that need to happen - and also doesn't help if there isn't that specific VCS on the system.

Upstream security is less of a concern since that hotlink just points to raw code on Github, and over HTTPS no less.

So what are the negatives here? For software like RVM, this seems like the best, most portable solution that works for the most people.


being on github and using https doesn't guarantee anything, obviously.

If you look at the actual script it contains a ton of sudo and also additional curl commands. The chain of security is very lacking.


It's nice to have the imports up front. Are you sure you can't give this special treatment to the modules that need it rather than doing it to literally all your dependencies?


It's also worth stressing that even if import side-effects are always going to be fast and not kill the interpreter, they are still a terrible idea.

Even spawning objects that are referenced in modules can have some rather unpleasant properties. The __del__ method will likely never get reliably called and other behaviours that work great in scripts break in subtle ways, especially with a KeyboardInterrupt. Threading and multiprocessing will leave processes running using 100% cpu. Trying to debug these things gets insane, as you can often only find them as the interpreter is dying.

I think import side-effects can be tempting because Python is often introduced using a scripting-oriented approach. Combined with Python following the principle of least surprise, most people doing this won't even realise that it's wrong. This does seem to be a rather common anti-pattern.


Putting side effects in the __init__ code seems to become quite fashionable these days but is a pretty bad idea since it removes the possibility to "just" import the functionality defined by the module without performing any initialization. Personally, I always try to avoid having a system that relies on some global configuration (like e.g. Django, Matplotlib or Flask do). In matplotlib for example this causes a lot of problems, since importing the pylab module will automatically (among other things) load and initialize a backend, which is then set in stone for the rest of the session.

IMHO, the way to go here instead is dependency injection:

Inject the configuration into the module through a function or class method (e.g. Flask.initialize({config state}). Wrapping all module functionality that depends on configuration in a class is a good idea here since it allows you to use multiple configurations in parallel and makes your code more modular.

As an example, in BlitzDB (a document-oriented database for Python, https://github.com/adewes/blitzdb) there is no global configuration at all, so you can initialize and use multiple backends in parallel as you please without worrying about side effects. SQLAlchemy does it in a similar way btw.


FYI, it is possible to call Matplotlib in an 'object-oriented' way without global state; though it's a bit more cumbersome than just using the pylab interface. See http://matplotlib.org/examples/pylab_examples/webapp_demo.ht... for an example.


Agreed. Twisted's behaviour of installing a reactor on import has caused me problems which could be worked around by importing conditionally or at a later time, but in cases like this, where one is importing modules dynamically, one doesn't know ahead of time which workarounds one needs.


Twisted's problem is the good old singleton (anti-)pattern. There can only be one reactor ever, and most galling of all, it can never be restarted once stopped.


It should be said that Twisted has put a lot of effort into making it possible to making it possible to deal with multiple reactors (mostly so they can run their test suite including all their supported reactors), and even to make it possible to unit-test Twisted code without a reactor at all (by having standardized mocks for the various things the reactor does).

Of course, that doesn't help the mountains of code written for Twisted that expect a singleton reactor, so we're stuck with it for the foreseeable future. Perhaps in the Brave New World of Python 3, where Twisted is just an implementation detail of the asyncio module, life will be better.


Why can't we at least restart a stopped reactor? That seems possible without breaking compatibility.


For those not in the knowing, avoiding import side effect means never having top level code in any python module except class, functions and constant definitions, other imports and the occasional if name = main. And constants must be built-in types.


Qt and Gtk seem to be the problem here, and they won't change. They aren't python modules as much as they are application frameworks and assume to be in full control. Gtk calls sys.setdefaultencoding("utf-8") when imported, changing the way your strings behave in the whole process.


With regards to Qt, the problem is not Qt but rather that something else is actually trying to use Qt at a time when it shouldn't.

pygtk setting the default encoding to UTF-8 is concerning—if true, that is certainly bad behaviour.


I haven't used GTK in a long time, but when I was using it, I was under the impression that pygtk was being deprecated, and everyone should be using GObject-Introspection instead.


I dunno—I was making assumptions, not being very familiar with the GTK scene in Python.


Not me either anymore. I tried to port to introspection + Gtk3 but ran into problems with widget subclasses and no documentation to resolve it with.


I only extensively used it from Perl and Vala, but with GObject-Introspection I tend to more often look at the original library documentation than something language specific. This can be a disadvantage at first, but for me it turned out more convenient, since GIR-inflated bindings can be more complete, as long as the inflation supports the features the API describes. They also tend to be more consistent in their differences to the original.

The problem about documentation seems to be the usual dilemma that as soon as you know enough to implement an API browser reading GIR that outputs the API in your language, you know enough about how the bindings work themselves to just use the original documentation.


well it doesn't really explain how to port from pygtk or how python classes interact with g-i.


That is true, and unfortunately I have used Python only sparingly until now, and have no experience using PyGtk at all.

I took a quick look at PyGObject[0] (unsure if that is what one would actually use for this), and the most helpful part seems to be [1], giving some hints on extending GObject.Object, which should be translatable to extending other existing GObject type classes.

As for porting, I agree that can be a pain in that case. When bindings move from a manual implementation to inflating it from an external description. You often end up with good tutorials for the first, and good API reference for the second. If I need to understand other kinds of Gtk bindings, I search for examples on custom TreeModels or other potentially messy things like that to get a feel for it.

[0] https://wiki.gnome.org/action/show/Projects/PyGObject?action... [1] http://python-gtk-3-tutorial.readthedocs.org/en/latest/objec...


"Gtk calls sys.setdefaultencoding("utf-8") when imported, changing the way your strings behave in the whole process."

Well, to be fair, that's more an issue with Python than with GTK


If there's one thing I've learned from working with old PHP codebases, it's that side-effects from include/require are a horrible horrible idea.

I'm OK with Java's static-initializers, but that's because they've got a whole bunch of rules around them preventing common kinds of abuse.


I am an author of a library [1] that does this. I understand the what the OP is talking about and agree with it, but like anything for me the rule is "(1) Don't do dangerous behavior X. (2) If you are an expert, do dangerous behavior X sparingly and with caution."

Mind you, my library does not connect to a database, or any such crazy thing. It simply creates a singleton which is then useful throughout your application. There would be very few cases where you would not want that singleton and you would want your own instance. If so, you are free to completely ignore the created singleton and make your own instance. The creation process is also idempotent, except the data you put into that singleton; that data is a special case: it is your explicit responsibility to namespace it, which is indeed the whole point of this library. I feel pretty good about this thing.

[1] https://github.com/ipartola/groper


Thinking about it, I do the same thing with my Ruby library mime-types[1]. The library is only useful if you load some data into its registry, so it does so automatically, unless you specify RUBY_MIME_TYPES_LAZY_LOAD in the environment.

I've just put myself a task to reverse the behaviour so that lazy loading is the default.

[1] https://github.com/halostatue/mime-types


I have the unfortunate "luck" to be using a Python library at work that _loves_ to use import side-effects. Importing it like God intended makes it parse command-line arguments and fail if it doesn't like what was passed in. And that's just the start. Nearly every __init__.py has code in it, including class definitions. I have no idea why.


In case a non-Python programmer is reading this and misunderstood your comment, I'll point out that having code in __init__.py and having import-time side effects are completely unrelated.

When I put any code in __init__.py, it's usually just to make the import path shorter for the programmer using my library. So instead of this:

from mypackage.models import Foo

...users can do this:

from mypackage import Foo

And all it takes to support that is putting this in mypackage/__init__.py:

from mypackage.models import Foo

A quick Googling of "Should I put code in __init__.py" shows a lot of people doing that same pattern. It doesn't show a consensus of people saying that more substantial code in __init__.py is an anti-pattern.


To me you're using __init__.py right. The person who wrote the library I mentioned really isn't.


It's like programmers who find out about metaprogramming. Usually they grow out of it.


I haven't grown out of metaprogramming at all. I just try and use it only when it's the best option. Usually because it means reducing duplication.

I liked metaprogramming to an extent in C++. I _love_ it in D.


Replace "best option" with "only practical option" and you're good to go.


That may depend on the language you are using.


In Perl, I find metaprogramming to be extremely powerful. I don't go too deep into Class::MOP usually. However, I do find that focusing on tooling before code is usually a net win. This means a lot of work can be done by writing DSL's that use metaprogramming behind the scenes. It works well and has become a common approach for certain kind of things (like object frameworks).


I'm not saying you should never use metaprogramming. Sometimes there is no other (practical) way. But it can be easily abused, often for a little added convenience but a much larger conceptual complexity and additional debugging issues. Why have a well-designed API when you can have a metaprogrammmed mess instead?


Would anyone like to share their experiences avoiding this sort of problem in the context of web frameworks and building the back end for larger web sites/apps?

As an example for discussion, the first time I wrote a Flask-based back-end, I backed myself into a corner almost immediately in the following way.

Firstly, the WSGI file that the web server uses to start the application followed the suggestion in the Flask docs by doing this:

    # webserverseesthis.wsgi
    from yourapplication import app as application
That’s not so bad, but then I started doing application configuration and loading various Flask plug-ins as side effects of that import:

    # yourapplication/__init__.py
    app = Flask("yourapplication")
    
    # Do some general application configuration.
    app.config.from_pyfile("/path/to/configuration/file")

    # Set up some overarching security things that modify application behaviour.
    from flaskext.securityplugin import SecurityPlugin
    sp = SecurityPlugin(app)
This seemed at the time like the obvious place to put such things, but of course, this is really just a variation on the mistake we’re discussing here.

To compound the error, I then used Flask’s decorators to wire up routes from various URLs to the relevant parts of my code. Those decorators work on the application object (sticking with ideas common to many Python web frameworks and avoiding getting into anything more Flask-specific like blueprints) so I was effectively creating circular dependencies from almost everything to that top-level package:

    # yourapplication/pages/home.py
    from yourapplication import app

    @app.route('/')
    def home_page():
        # Render home page
and then from the top-level package onto almost everything so all those decorators could take effect:

    # After setting up the application object in yourapplication/__init__.py
    import yourapplication.pages.home
Now, as long as this kind of code only ever runs as a WSGI application behind a web server, you get away with these dependencies up to a point. In practice, your WSGI set-up imports the top-level application package, which in turn sets up the application object everything is going to depend on and only then imports all the supporting modules/packages, and everything “works”.

However, as soon as you want to write tests or otherwise reuse any of the code in a different context, the entire system is a big bowl of spaghetti with all the usual problems. The moment you import any part of the system to run a unit test on something in it, you get much of the rest of the system as well, complete with the side effects of any imports therein.

This was of course all horribly naïve on general programming principles, but the nature of these frameworks tends to push in this direction, and even Flask’s own documentation features various simple examples that follow a similar approach, so I’ll forgive myself for falling into the trap the first time. I’ve since experimented with various techniques to break the cycles and avoid the side effects on imports, with some success, but frankly I’ve never found a satisfying, general strategy for organising larger code bases built around a web framework.

How is everyone else doing this?


The example on the Flask tutorial with the app at module level is really only viable if you cram everything inside one module, it gets old soon. You should use a factory pattern, like this:

    def app_factory(config):
        app = Flask("yourapplication")
        app.config.from_pyfile(config)
        # ...
        return app
Then whenever you need access to your app object, you use the provided proxy:

    from flask import current_app as app
You can't use it at module level though (because there isn't an application context setup by that time), so this doesn't work:

    @app.route('/')
    def home_page():
        # ...
Instead, hook up views inside your app factory:

    def app_factory(config):
        # ...
        app.route('/')(somemodule.home_page)
        # ...
For the test suite, you can now instantiate apps with a different configuration:

    from flask import current_app as app
    from myfoo import app_factory
    import unittest

    class MyFooTest(unittest.TestCase):
        # ...

    if __name__ == '__main__'
        test_app = app_factory(test_config)
        # The app proxy will point to that inside test cases
        unittest.main()

TL;DR: The factory pattern is your friend. Parametrize all the things. Avoid singletons at module level, this leads to spaghetti. If you need convenience, create proxies.


You should use a factory pattern, like this: [...] You can't use it at module level though (because there isn't an application context setup by that time), so this doesn't work: [...] Instead, hook up views inside your app factory: [...]

That’s basically what I did on my second iteration. It is an improvement in some respects, particularly breaking the circular dependencies caused by using the decorators on the global application singleton. On the other hand, now you need some variation of God Object that not only imports all your modules that used to have decorators but also knows enough about their internal implementation to set up the routes and things like pre- and post-request logic directly on the application object you get back from the factory.

The next logical step after that then seemed to be having each module/package that contains views or similar logic provide some sort of initialization function that is declared when you import the module and takes an application object as a parameter. Then we can use app.add_url_rule and friends to wire up the various handlers within each package/module but decoupled from any sort of global application object that needs the circular import. This is the tidiest style I’ve found so far, and all my Flask projects in recent years have used something broadly like it. It does only require one import followed by one initialization call for each package/module, which logically seems to be as good as we can get, given that our starting point is a desire to avoid including any initialization implicitly within the import itself and to avoid depending on global singletons.

Somehow, it still doesn’t quite feel right for some reason. I think it’s because even with that general design, I’ve still got a recurring pattern in each of how I create these modules and how I import and then initialize them. My instinct says we ought not to need that extra boilerplate in a highly dynamic language like Python, but I’ve yet to find any alternative that is neater in general. At least in the most simple cases this only adds a couple of extra lines (converting the decorators to an init function in each package/module, and then calling that function at the top level after importing the package/module), which is clearly better than the earlier, more highly connected designs.


I started running into that problem quickly as well. One of the examples I crib off of is Overholt (http://mattupstate.com/python/2013/06/26/how-i-structure-my-...). It still feels strange in places, but it's the best I've seen so far.


I'm on my mobile, so I can't paste code here. But the way I do it is to subclass the flask app and setup the routes in the setup method. Almost nothing happens on module scope. All views are defined in a seperate module using subclasses of flask views and they hold weak references to the app object. With this setup I neve had any problems testing individual modules. It also works together with flask's unit test client.


App factories help with this a little bit: http://flask.pocoo.org/docs/patterns/appfactories/


Can anyone suggest best practice alternatives to import side-effects?


Don't run any functions at the root level of your module.

Instead, if you really need long-lasting objects which get initiated once, then use an object, and put any initialisation stuff in it's `__init__` method. Then the module can be imported whenever, but your initialisation stuff is only called when the user of your library creates a new instance of that class.

For bonus points, make your classes able to be used with the `with ...` syntax, so then lifetime is kept to a minimum, and errors/whatever are dealt with by default.

If you really really need to monkey around and take control of the whole python interpreter (gevent, twisted, and possibly some GUI frameworks come to mind...) then don't do that at import time, do it with a `run_forever()` or `take_control` type function.

But yes, a virtualenv for every project does help a lot with not accumulation cruft.


I sometimes make calls to collections.namedtuple and other class building functions at the same time as the rest of my definitions.


If you pass verbose=True to namedtuple(), you can see that it's essentially just defining a new class. This is something that people already do at the top level of a module.


Well, prohibiting all side-effects would be silly and if taken literally, you won't be able to even declare classes and functions, since these get compiled to bytecode that is essentially "instantiate a function object with this code blob", i.e. it's executable code with the side effect of binding a name in the module's scope to a new function object.

In general - do not do anything that might fail, might take a long time (>0.5 seconds), or can't be stopped easily.

Specifically - do not create windows, do not connect to the internet, do not try to create a database in a hardcoded location, do not connect to postgres. Do not do things that will require calls to the operating system other than allocating memory.

The one thing I'm willing to concede would be reading a default configuration file. But only if you're certain you've written it in a way that won't blow up if for whatever reason the default path is not readable for the current user, or in other edge cases.

If your library needs to talk to the OS, it will need to be passed initialization parameters. Just provide a top-level class that the user instantiates and passes all the needed initialization parameters. Resist the urge to do silly things in it like having a module-global instance variable for it that is set when the class is first instantiated and overloading the __new__ method to return that instance if it's not None.


Keep your heavy runtime code and invocations behind "if __name__ == '__main__'" blocks. Alternately, if you're using a framework keep your runtime code constrained to the appropriate entry class/function. Outside of those blocks, your modules should be mostly constants, functions, and classes.


Just don't run code in the root of your modules except in the entry point(s) of your program. In the modules, only declare.


This is one reason why I prefer Haskell ;)


I think most statically typed languages don't care for this kind of shenanigans.


Java for example runs arbitrary code when a class is loaded (static {} blocks).

Windows has DllMain, which lets you run code at load of a shared lib and similar things are possible on Linux, using the -init linker flag and friends.

Finally, Haskell has features that allow you to load, compile and run Haskell (through the GHCMonad, if installed), which allows precisely for those shenanigans.

All those lack the habit that the OP is describing (and which broke for him): the use runtime introspection during development time (possibly in a REPL), but that's another pair of shoes.


In addition to the many other languages listed here, Go allows side effects on import.


That's interesting. I can see the motivation... it's helpful to have modules come into the world fully initialized, and initialization often involves side effects.

Is Rust going to allow side effects on initialization?


No. Having module import perform side effects delays program startup unnecessarily and makes the semantics of the program depend on the order in which modules got initialized, which is confusing.


I was referring more to Haskell's lack of side effects. I don't think you can do this in modules.


You can, with unsafePerformIO. Unlike many instances where we can just wave away unsafePerformIO and declare it's an exception that you'll probably never encounter unless you ask for it, this is one instance you may encounter "in the wild" that may not be entirely safe, or may introduce a global scope where you weren't necessarily expecting one. In particular, this usually takes the form of creating a Reference of some kind at initialization time (IORef, STM TVar, etc).


    $ echo 3 > /proc/sys/vm/drop_caches
    $ time python2 -c 'help("modules")' >/dev/null
        /usr/lib64/python2.7/site-packages/gobject/constants.py:24: Warning: g_boxed_type_register_static: assertion 'g_type_from_name (name) == 0' failed
      import gobject._gobject
    python2 -c 'help("modules")' > /dev/null  1.58s user 0.23s system 69% cpu 2.626 total
    $ time python3 -c 'help("modules")' > /dev/null
    python3 -c 'help("modules")' > /dev/null  2.00s user 0.17s system 74% cpu 2.928 total
    $ time python2 -c 'help("modules")' >/dev/null
    /usr/lib64/python2.7/site-packages/gobject/constants.py:24: Warning: g_boxed_type_register_static: assertion 'g_type_from_name (name) == 0' failed
      import gobject._gobject
    python2 -c 'help("modules")' > /dev/null  1.26s user 0.11s system 99% cpu 1.375 total
    $ time python3 -c 'help("modules")' > /dev/null
    python3 -c 'help("modules")' > /dev/null  1.75s user 0.09s system 99% cpu 1.852 total
perhaps your issue is having too many things installed?


The laptop I'm working on was decent when new, six years ago, but is now not the fastest thing on the block. It has certainly collected quite a lot of things there (mostly from system packages), but it is probably just one or two things that are spending most of the time (excluding I/O time).

It's interesting; now that I've tried running it a few more times, `help('modules')` on my Python 2.7 is getting down to six or so seconds. (On Python 3 it takes around 0.15 seconds.)


Wow! My Pthon2 help('modules') opens a window titled "Hello from wxPython"!

Python3 behaves well, and both succeed...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: