Author links open overlay panel RobbieMorrison https://doi.org/10.1016/j.esr.2017.12.010Get rights and contentUnder a Creative Commons license open access
- Growing calls for policy-based energy system models to be “opened up”.
- Energy system modeling projects are adopting open source development methods.
- Energy system database projects are co-evolving to serve datasets to open models.
- Source code distributed under standard copyright cannot be legally used, built, or run.
- Source code and datasets distributed under standard copyright cannot be republished.
A mid-2017 survey shows that 28 energy system modeling projects have made public their source code, up from six in 2010, and none in 2000. Another six web-based energy sector database projects and nine hybrid projects were established during this same period, some explicitly to service open modeling.
Three distinct yet overlapping drivers can explain this shift in paradigm towards open methods: a desire for improved public transparency, the need for genuine scientific reproducibility, and a nascent experiment to see whether open source development methods can improve academic productivity and quality and perhaps also public trust.
The associated source code, datasets, and documentation need suitable open licenses to enable their use, modification, and republication. The choice for software is polarized: teams need only consider maximally permissive (ISC, MIT) or strongly protective (GPLv3) licenses. Selection is influenced by whether code adoption or freedom from capture is uppermost and by the implementation language, distribution architecture, and use of third-party components. Permissive data licenses (CC BY 4.0) are generally favored for datasets to facilitate their recombination and reuse. Official and semi-official energy sector data providers should also prefer permissive licensing for copyrightable material.
Calls to “open up” energy system models are growing, particularly for those models used to inform public policy development [, , , , , , , , , ]. Simultaneously, a number of energy system projects are releasing their source code under open software licenses and starting to build user and developer communities. In parallel, several open energy sector database projects have been established to collect, curate, and republish the datasets needed by these models. This seismic change in practice is reviewed, together with the legal issues, mostly due to copyright, that enable and constrain these activities.
There are three distinct yet overlapping motivations for making energy system models open: improved public transparency as a reaction to sustained criticism over policy opaqueness, scientific reproducibility as a response to concerns over minimum scientific standards, and open development as an attempt to leverage the benefits that open source software development methods can offer. These three motivations can be seen as a continuum, with public transparency as the least ambitious and open development the most.
While this article is aimed at energy policy models, much of what is discussed is likely to be applicable to other computational domains, such as the numerical modeling of urban air quality, economic systems, and climate protection strategies.
The legal examples provided reference either US or German law, primarily because these two jurisdictions are responsible for most of the litigation on open licensing and consequently most of the analysis.
Some recent appeals for greater openness in energy system modeling [1,3,6,7,10] remain silent on the issue of software licensing, presuming perhaps that source code can be lawfully used once published. This is correct if and only if open software licenses are provided. Otherwise, standard copyright prevails and precludes all usage beyond simple inspection. In contrast, datasets under standard copyright can probably not be legally machine processed, although the legal analysis on this matter remains extremely limited (discussed later).
The situation concerning the republication of source code and datasets under standard copyright is clear. It is a breach of copyright to publicly distribute code and data if open licenses are absent. This means that encumbered code and data cannot be republished to support public transparency, scientific reproducibility, or open development. Hence, DeCarolis et al. [4:1849] erroneously conclude that “models with open source code and data but with no license are assumed to allow redistribution without any restrictions”.
Indeed, only open licenses can unequivocally grant the right to study, use, improve, and distribute the associated code, data, and content — known as the four freedoms and first articulated for software by the Free Software Foundation (FSF) in February 1986 [11:121–122].
But open software licensing is as much a development model as it is a legal instrument [12:ii]. Open development implies that projects actively build communities by using code sharing platforms, social media channels, periodic workshops, and other forms of engagement. Open development should be seen as aspirational: it is not a necessary condition for public transparency or scientific reproducibility.
Open data has only really became an issue for energy system research with the advent of open modeling. Prior to that, closed source projects could purchase and use proprietary information under non-disclosure agreements (NDA). Or they could employ publicly available copyrighted data without attracting attention. In contrast, fundamental research domains like climate modeling have long shared unencumbered code and data. But energy system models need information from official and semi-official sources, including system and market operators, with much of it privately held. These operators and their umbrella organizations have, thus far in Europe at least, been reluctant to open license their public datasets or release key engineering information, leading to the current impasse and giving rise to crowdsourced projects to circumvent at least some of these limitations. It is presumed in this article that such information meets the legal threshold for copyrightability.
Selecting code and data licenses for an open energy system project can be daunting. This article discusses the issues involved and provides some guidance. The computer language used to implement a model can have a significant influence on the choice of software license, distinguished thus: compiled languages (C++, Java, C), interpreted languages (Python, R), and translated languages (MathProg, GAMS).1 So too can the selected distribution architecture and the third-party libraries and source code that the project intends to either utilize or make available to other projects. These various considerations are strongly coupled.
This article proceeds thus. First, public transparency, scientific reproducibility, and open development are reviewed. A short audit of open energy system projects follows. Next, standard copyright and open licensing are examined. Attention then turns to the specifics of code and data in relation to open modeling, including the selection of suitable licenses.
2. Public transparency
Public transparency is a public policy ideal which requires, at the least, that the model in question be fully documented and that the datasets used be made available for inspection, but neither necessarily under open licenses. Some authors prefer to term the headline concept comprehensibility rather than transparency [3:2]. The qualifier public is used to exclude other less onerous forms of transparency, such as providing peer reviewers with secondary material.
Public transparency should help discourage what Geden [13:28] terms “policy-based evidence-making” in contrast to evidence-based policymaking. Both the model framework and its underlying design (expressed in code) and the selected scenarios (represented as data) embed limitations and assumptions that merit close scrutiny.
Acatech et al. [1:16–17] suggest that public transparency is best served with layered publishing for different audiences, ranging from short policymaker summaries to technical reports in sufficient detail to enable the results to be replicated. Cao et al. [3:4] consider (grammar corrected) “open source approaches to be an extreme case of transparency that do not automatically facilitate the comprehensibility of studies for policy advice”. While that may be true, open development can also enhance transparency. Diligent open source projects produce clean code and good documentation, if only to service their own needs. Wiese et al.  argue that the public trust needed to underpin a rapid transition to zero carbon energy systems can only be built through the use of transparent open source energy models. Opaque policy models simply engender distrust. Strachan et al. [14:2] suggest that closed energy models that provide public policy support “fall far short of best practice in software development and are inconsistent with … publicly funded research”. The Deep Decarbonization Pathways Project (DDPP) seeks to improve its modeling methodologies, a key motivation being “the intertwined goals of transparency, communicability and policy credibility” [15:27].
The oft heard call that models should publish their equations needs some examination. Mathematical programs, typically linear (LP) or mixed-integer (MILP) and written in an algebraic modeling language (MathProg, GAMS), can list their equations over some few pages because their codebase is essentially the programmatic expression of these equations [16:5835–5836]. But a sophisticated simulation/optimization framework (implemented in say C++ or Python) may need hundreds of pages to adequately record its workings. For instance, the core of deeco is documented in a 145 page PhD report  and a 239 page user manual , with later enhancements adding proportionately to this material. It is rare (in the author’s experience) for software descriptions to be sufficiently complete and correct to enable reimplementation. Rather, the original developers must be contacted to fill in any number of absent details.
Allied to the notion of public transparency is that of market transparency [19:9]. Market transparency measures include the 2013 European electricity market transparency regulation 543/2013, intended to improve market liquidity and system security and also the standing of minor players . The regulation requires that transmission system operators and wholesale market operators collectively gather, aggregate, and publish both electricity market and system reliability information. The machine use of this data, but not its republication, is permitted (discussed later). Moreover, the datasets thus provided need only be made available for five years and can then go dark.2
Paywalled research literature presents a significant barrier to public transparency, while unembargoed open access (OA) publishing promotes scientific reproducibility and open development. Free-of-charge provision of material under standard copyright and OA publishing under open content licenses are distinct concepts with differing objectives and attributes .
3. Scientific reproducibility
Replication is the “ultimate standard by which scientific claims are judged” [6:1226]. Replication, in the context of energy system modeling, would mean reimplementing the software and collecting the input data anew. As a consequence, replication in the computational sciences is rarely feasible, so reproducibility represents the minimum attainable standard. Reproducibility means taking the existing code and data and repeating the analysis. Even so, reproducibility is no guarantee that the original results are correct and the conclusions valid.
The reproducibility spectrum ranges from making only the source code available to providing the code, make files, datasets, and executables [6:1226]. Ince et al.  argue that code must be published for reasons of reproducibility, although remain silent on the question of licensing. But other researchers must be legally free to experiment with the code and data. Some practitioners believe reproducibility requires a step change in scientific culture, covering research practices, incentives, funding policies, and publishing norms .
In the context of energy system modeling, DeCarolis et al. [4:1850] argue that repeatable analysis can only be achieved when the source code and datasets are jointly placed under publicly accessible version control so that independent researchers can select, run, and check specific model instances.
The right to inspect, use, modify, and republish the code and data are fundamental conditions for scientific reproducibility. Only open licensing can guarantee these conditions.
4. Open development
The third motivation is open development. Open development is shorthand for the use of internet-mediated open source development techniques and practices. Key attributes of open projects include: unrestricted participation, status through contribution, extreme transparency, an emphasis on consensus with voting as a last resort, and minimal but sufficient governance . Open development has its roots in the free software movement, which has produced the GNU/Linux system, the GNU GCC compiler, and the LibreOffice suite, to name but some.
Open development forms a part of the nascent collaborative commons, a term coined by Rifkin  which is mostly digital in nature and enabled by internet technologies. Raworth  predicts an increasing role for this new sector, placing it alongside the state and the conventional economy in terms of importance in the sustainability age.
The relationship between open development and the scientific method is an interesting one. Eric Raymond  attributes the success of complex open source software projects like the Linux kernel, currently numbering 1600 developers, to the “massive independent peer review” process that accompanies such projects . Notwithstanding, the degree to which open source development practices can contribute to the computational sciences remains largely unknown.
Some of the characteristics (and excitement) of open development are captured by Linux kernel developer Greg Kroah-Hartman recounting his first experiences of contributing code [29:14]:
I wrote a driver over the weekend and submitted it, and I swear within an hour people came back pointing out problems and telling me: “This is wrong, this is wrong, this is wrong”. It felt awesome. They were critiquing my code, and I was learning from it, so I said: “Yes, you are right. This is wrong, this is wrong, and this is wrong”. I iterated and fixed problems with it. It got accepted into the kernel. It was fun. I think feedback is very important. That feedback loop of people pointing out errors or problems with what you’re doing is [a] very traditional, I guess scientific, method. And I love it. That’s how we get better.
Open development can potentially help promote public transparency and build trust. Open source software projects have traditionally been adept at engaging newcomers (as above), whether for recruitment or to extend their user base. The OSeMOSYS project is clearly the most advanced in this regard within the open energy modeling policy domain, using a range of channels to communicate with users and developers and with a wider energy policy audience [9,16,30].
Open development can offer another virtue. It is sometimes suggested that closed energy system models developed by particular research institutes are designed, calibrated, and run to produce certain outcomes, including in relation to the future ranking of technologies [31:389–390]. Whether true or not, open development will naturally encompass a range of views that can help to identify and reduce such biases. Energy scenarios and energy models, while useful, have clear limitations that should be debated candidly  and not over-interpreted .
Open projects often seek end-to-end openness and Fig. 1 shows an open modeling pipeline. Data from official and semi-official sources or collected by the public enters on the left side. Code development occurs in the middle. Scenarios, defined in part through public engagement, enable specific models to be formulated and run. Interpretation, followed by scientific and gray publishing and outreach, takes place on the right side.
An open platform is not essential for public transparency or scientific reproducibility, but open source projects tend to avoid proprietary environments, if only for reasons of financial cost and vendor lock-in. In the context of the diagram, a toolchain describes the set of programming tools and system libraries required to build software in a particular language. The preferred languages for open energy system projects are, in rough order of popularity: Python, Java, MathProg, C++, R, Ruby, and C. Julia is being mooted as a numerically superior replacement for Python. There is a clear trend towards the use of interpreted languages.
The most commonly encountered closed platform for energy modeling is the GAMS language and integrated development environment (IDE). The significant cost of GAMS (upwards of USD 3200) effectively limits participation to those who can access an institutional copy.3 The success of OSeMOSYS can be attributed in part to its choice of the MathProg language and GLPK translator [16:5854].
Future energy system scenarios should be developed using public consultation. There is a large body of law on public consultation adequacy, but the topic falls outside the scope of this article. Even so, public outreach is increasingly viewed as important activity for scientists .
While not an open licensing issue directly, the choice of distribution channel may have a significant influence on the course of a project. Common methods include public code hosting sites, institutional git servers, websites providing tar files, and email on request schemes. Some projects require users to register first while others offer anonymous downloads. Yet others require individual approval by a project administrator. The git revision control system and the GitHub platform have caught the imagination of the scientific modeling community in a way that previous code hosting sites did not .4 Even so, around 80% of the public repositories on GitHub fail to include a software license of any description .
Projects designed to be open from the outset can be better structured and documented for their eventual release. Teams can agree to write to a coding standard. They can select a software license with due consideration and ensure that only compliant software and open data are used. Legacy projects wishing to open up may find it hard to identify and locate both the contributors and their contributions and obtain the legal consents necessary for the new arrangements. If permission is not forthcoming, the associated code will need to be removed and reimplemented. Protected datasets will likewise need to be substituted by open equivalents.
A central claim for open development is that it will reduce duplication of effort. This is self-evident in the case of energy sector data, where a community effort to gather, curate, and maintain high quality energy sector databases will remove the need for individual projects to undertake similar work at a necessarily lower standard (see examples). The situation regarding code development is less clear cut. Given that closed source research projects are often episodic, some spanning only a single PhD, then migrating part of this effort to durable open source projects should yield benefits in both directions. Model consolidation offers another potential windfall. Open electricity sector modeling teams in Germany have begun to discuss model convergence, considering initially the sharing of components like data structures [9:66]. It remains to be seen whether these kinds of returns to cooperation materialize or not.
5. Open energy system projects
This section presents an audit of the 43 open energy system projects active in July 2017. Fig. 2 depicts the classification scheme used. Space considerations here preclude a proper survey. There is a strong bias towards high-resolution technical models and towards engineering and environmental information. In the scheme outlined, a framework is software that is later populated with data to create a set of models (or framework instances) based on predefined scenarios. This approach respects the design doctrine of code and data separation. The term framework is not used here in its computer science sense (although oemof has adopted this architecture).
Energy system modeling projects are either electricity sector frameworks or energy system frameworks, the latter trading detail for scope. The electricity grid identification projects are a direct result of the information deficit in relation to distribution and transmission networks. Such projects crowdsource their data, either directly or via OpenStreetMap, to infer a plausible internally-consistent model of the electricity grid under consideration, using techniques from statistics and graph theory [37,38]. The data portal projects comprise two camps. The first is the semantic wiki which employs crowdsourcing and semantic web protocols to assemble, structure, and publish energy data. Enipedia, which went live in March 2011 or thereabouts, was the first such wiki to cover the energy and allied sectors . The second camp provides custom on-demand datasets based on user selections, such as geolocation and technology. The Renewables.ninja site, launched in September 2016, serves synthetic renewable generation time-series derived from US satellite weather data [40,41]. The energy sector database projects deploy either relational databases or file servers, split evenly by number. Both approaches guarantee stable identifiers (URLs) and offer web application programming interfaces (web API) to enable programmatic access.
Table 1 indicates that much of the open energy modeling revolution is taking place in Germany, followed by the United States. Possible reasons for the early adoption by Germany include the advanced state of the Energiewende [31:379–412], the absence of official government models, favorable research funding, and a tradition of open source projects (SUSE Linux, KDE, LibreOffice), open knowledge projects (Wikipedia DE), and similar (CCC, FSFE).
Table 1. Open energy project counts by host country and type. These number 43 as of July 2017, there were six in 2010, and none in 2000. The projects that make up this census are listed at the bottom of the table.
|Country||Open energy system frameworks||Open electricity sector frameworks||Open electricity grid identification projects||Semantic wikis||On-demand datasets||Open energy sector databases||Totals|
Energy system frameworks: Balmorel • Calliope • DESSTinEE • Einstein • Energy Transition Model • EnergyPATHWAYS • ETEM • ficus • oemof • OSeMOSYS • TEMOA • WWS project Electricity sector frameworks: DIETER • Dispa-SET • EMLab-Generation • EMMA • GENESYS • GnuAE •NEMO • OnSSET • pandapower • PowerMatcher • PyPSA • renpass • SIREN • StELMOD • SWITCH • URBS Electricity grid identification projects: DINGO • GridLAB-D • Hutcheon and Bialek dataset • OpenDSS • OpenGridMap • osmTGmod • SciGRID Semantic wikis: Enipedia On-demand datasets: Renewables.ninja Energy sector databases: Energy Research Data Portal for South Africa • energydata.info • Open Power System Database • OpenEnergy Platform • OpenEI • reegle.
The first three projects to release their code (Balmorel in 2001, deeco in 2004, and GnuAE in 2005) did so for reasons of open development. The next projects (OSeMOSYS in 2008 or thereabouts and TEMOA in 2010) cited transparency [42:3] and transparency and reproducibility [43:1] [44:339] respectively.5 The Balmorel codebase contained the following comment (spelling corrected):
Efforts have been made to make a good model. However, most probably the model is incomplete and subject to errors. It is distributed with the idea that it will be useful anyway, and with the purpose of getting the essential feedback, which in turn will permit the development of improved versions to the benefit of other users. Hopefully it will be applied in that spirit.
Two recent ventures warrant mention. The SIREN (SEN Integrated Renewable Energy Network Toolkit) project from Western Australian NGO Sustainable Energy Now is the only open energy system model developed by an NGO for advocacy purposes, showing that official analysis can be countered by community software development . The Dispa-SET model from the JRC is the first energy system model from the European Commission to be released, with public hosting on GitHub from March 2016 under an EUPL 1.1 license .
Open energy system developers are now networking to advance common aims. One notable example is the Open Energy Modeling Initiative (openmod) which essentially began life as a mailing list in October 2014. The initiative deals with issues of interest to open modelers, including: good practice in open source projects, barriers to same, energy data and metadata standards, energy model classification and cataloging, open software and dataset licensing, open access to research results and publications, and software skills relevant to energy modeling.
6. Copyright law
The legal context is important because much of what can and cannot be done with code, data, and content is governed by copyright law. Copyright law grants the author (and their heirs) of an original work a time-limited exclusive right to its use and distribution. The law is intended to support preferential exploitation and thereby incentivize creative activity (in stark contrast to the open development ethos). Copyright can only protect an original expression of ideas, not the underlying ideas themselves. Facts cannot be copyrighted but, under US law, their “collection, aggregation, analysis, and interpretation” may be if these actions are sufficiently creative [47:86].6 Multiple authors are permitted and copyrights may be assigned to an institution or other legal person. Pseudonyms, including arbitrary usernames, are acceptable. Copyright infringement is primarily a civil matter, but normally only substantial contributors can litigate.
To complicate things, the interpretation of copyright law depends on whether the target is, in this context, code, data, or content. Copyright law was not developed with either source code or digital data in mind. Indeed, software only became eligible for copyright in the US in 1980 when legislation was amended to explicitly reference “computer programs”. The relationship between machine-readable data and copyright is in its infancy.
The term standard copyright is used here to indicate that the copyright holder has not stipulated additional conditions. Standard copyright is the default under law, even when no claim for copyright is expressly made. Open licenses add conditions that enable downstream use, modification, and redistribution.
There are several phrases used to describe the dispersal of copyrighted material. This article generally adopts the term distribute, other roughly equivalent language includes publish, make available, and communicate to. These various terms arise from the definitions written into different national laws and also into open licenses. Meeker [12:71–83] provides an extended discussion of what constitutes distribution in relation to software under copyright. New computer practices, such as web-mediated remote execution, further complicate this matter. Jaeger  analyses the lawful use and redistribution of energy data from official and semi-official sources within Europe under a range of use cases. In contrast, this article is concerned solely with the unrestricted redistribution of code and data.
As noted, this article draws on US law and German law. The US statute is 17 USC — Copyrights and is available online . The German version is the Urheberrechtsgesetz (UrhG) or the Act on Copyright and Related Rights. An official translation is available .7 The UrhG generally provides more rights for authors than does its US counterpart.
The question of whether the individuals who produced the material, be it code, data, or documentation, or their host institution holds the copyright is not covered here. The situation varies between country, funder policy, contributor status, and any terms of employment. See instead [9,50].
6.1. Standard copyright
As noted, standard copyright is the default state if no license is specified. Standard copyright precludes the use and further distribution of source code outside of a few narrow exceptions. There is no restriction on reading the source code or recycling the ideas contained therein.
Under US law, software which lacks an open license cannot be legally used, built, or run. Meeker [12:148] states in relation to source code on GitHub (emphasis added): “unfortunately, if no licensing terms are applied, the default is that no rights are granted to use the software”. Use in this context would include compiling the source code and running the resulting program. Use would also cover transferring source code to an existing codebase. German copyright law (UrhG § 69c) provides a definitive list on how software, or more precisely “computer programs”, under standard copyright may be used and this provision prohibits the use cases just indicated. 8,9
The machine processing of datasets under standard copyright remains unclear. Under German law, a copyrighted dataset downloaded anonymously from the internet and later used in a computer model is quite possibly a breach of copyright. The right to inspect the dataset is implied by its presence on the internet, but other use cases are not. Again under German law, some forms of scientific research may or may not be exempt. Furthermore, it might not be obvious whether the dataset is original or whether it also incorporates copyrighted material from other parties. The situation in the United States is not known.10 In short, there is no legal security in such circumstances.
What is near certain is that energy datasets made available under European regulation 543/2013 by ENTSO-E (European Network of Transmission System Operators for Electricity) can be thus employed. In such cases, Jaeger [19:5] opines that “the data provided … can be … used as input for computer models and analyzed in many ways for scientific and commercial purposes”. This conclusion rests primarily on 543/2013 article 3 which describes the establishment of a central information transparency platform [19:19]. An acknowledgment of ENTSO-E as the source is required in scientific publications and reports [19:24].
As indicated earlier, the republishing of datasets under standard copyright is a breach of copyright. The original URLs may instead be circulated, but these can suffer from typos, linkrot, and resource drift, particularly if served from static websites that are periodically reworked . Moreover, datasets that have been reformatted to suit a particular model framework interface cannot be made public, nor can corrections and enhancements be easily propagated back upstream.
The location of the primary server may be significant for copyright claims involving the unauthorized distribution of content and can determine the choice of law [12:239–240].
The US legal doctrine of fair use permits minor usage for the purposes of critical review and similar. The notion of fair use also applies to the transfer of small amounts of source code (covered later) and other types of digital data . Fair use is not supported under German law, but a number of use cases are exempted, including modest forms of scientific research among associates.11 Outside of these exemptions, there is no provision for even one line of protected source code to be incorporated into another codebase.
6.2. Open licenses
Open licenses add binding conditions to a copyright.12 All open licenses grant the user free use of the work. Permissive licenses require attribution if the work is modified and distributed, while copyleft licenses additionally contain measures to prevent capture (covered shortly). A license terminates if any of its conditions are violated. Open licenses are irrevocable and non-exclusive. Morin et al.  discuss the open licensing of scientific software in general, but not scientific data. Fontana et al. offer a free software perspective .
Table 2 lists some commonly encountered licenses, based on family and target. The GPLv2+ notation indicates a GPLv2 license containing the phrase “or any later version” to make that license inbound compatible with the GPLv3. While software projects often adopt the latest release, the 1991 GPLv2+ license is still in wide use, even on new projects. The ODbL and ODC-By licenses are intended solely for databases, whereas the various Creative Commons version 4.0 licenses are designed for use with both datasets and databases .
Table 2. A selection of commonly used open licenses and public domain dedications, based on family and target, with version numbers given where significant.
|License family||Source code||Dataset or database||Content|
|Copyleft licenses||EUPL 1.2, Mozilla 2.0, Eclipse 2.0, LGPLv2.1, LGPLv3, GPLv2, GPLv2+, GPLv3, AGPLv3||CC BY-SA 4.0, ODbL||CC BY-SA 4.0, GFDLv1.3|
|Permissive licenses||ISC, MIT, BSD 3-clause, Apache 2.0||CC BY 4.0, ODC-By||CC BY 4.0|
|Public domain dedications||CC0 1.0||CC0 1.0, PDDL 1.0||CC0 1.0|
Abbreviations: AGPL, GNU Affero General Public License • BSD, Berkeley Software Distribution license • CC BY, Creative Commons Attribution license • CC BY-NC, Creative Commons Attribution NonCommercial license • CC BY-SA, Creative Commons Attribution ShareAlike license • CC0, Creative Commons Zero universal public domain dedication • EUPL, European Union Public License • GFDL, GNU Free Documentation License • GPL, GNU General Public License • ISC, Internet Systems Consortium license • LGPL, GNU Lesser General Public License • MIT, Massachusetts Institute of Technology license • Mozilla or MPL, Mozilla Public License • ODC-By, Open Data Commons Attribution license • ODbL, Open Database License • PDDL, Public Domain Dedication License. The standardized but less readable SPDX identifiers are not used here.
GPL notation: The convention of abbreviating versions 2.0 and 3.0 of the GNU GPL as GPLv2 and GPLv3 respectively is retained here. Similarly for other GNU licenses. The GPLv3+ license also exists but is omitted here for simplicity.
Omitted: The CDLA-Sharing 1.0 and CDLA-Permissive 1.0 Community Data License Agreement licenses, announced by the Linux Foundation on 22 October 2017, are not included here due to their novelty.
Open licenses differ from their proprietary counterparts in that they are not negotiated on a case-by-case basis, nor is any license fee transacted. Indeed, no contact between the user and the copyright holder is required. Open licenses are non-discriminatory by definition, which means that no application domain, including commercial usage, can be legally excluded. The Creative Commons suite of licenses does offer a non-commercial (NC) provision but this qualifier is not widely used or necessarily recommended. Nor is it strictly an open license. The legal boundary between non-commercial and commercial usage may be difficult to delineate, although several exemptions under the German UrhG rely on the concept.
Open licenses contain inbound conditions and outbound conditions. The inbound conditions trip of receipt of the work and all contemporary licenses, irrespective of type, offer the same inbound conditions. Namely that the user can do whatever they wish locally with the software, data, or content and that user accepts such use is at their own risk, as far as national law allows. The outbound conditions trigger on distribution and not modification. All licenses require that the copyright holders be acknowledged. An author obligated to transfer their copyright to their employer will therefore not be personally named. With regard to software, permissive and copyleft licenses differ as to whether the source code must be made available when distributing binary files.
Open licenses require that the same conditions be present in any subsequent derivative works. In this sense, they are sticky. The copyleft GPL family, in addition, expressly prohibits further restrictions from being attached.
A potentially important but orthogonal issue to copyright is the way in which patent-encumbered source code introduced by a strategic contributor is handled [12:162–169]. Software licenses vary in their response. Some (Apache 2.0, GPLv3) claim a patent grant, while others terminate defensively for the offender. These matters are not further covered because it is improbable (in the author’s view) that an energy system model would encounter this kind of attack.
6.3. Copyleft licenses
Copyright law was innovatively stood on its head with the release of the general purpose GNU General Public License version 1.0 in February 1989 by programmer and software activist Richard Stallman . The GPL classes as a copyleft license [58,59]. Copyleft licenses are designed primarily to avoid code capture or enclosure. Enclosure being the long held practice of privatizing common property. The term is used here to describe source code that was originally public being incorporated into closed source programs without subsequent fixes and enhancements being revealed. Copyleft licenses prevent this process, whereas permissive licenses permit it. Indeed the raison d’etre for the GPL is to create a software commons that can never be privatized . As Meeker [12:96] remarks, the “GPL is a kind of constitution for the free software world”.
There are several grades of copyleft [12:34] [50:3]. Weaker copyleft (LGPL) allows open libraries to be linked to by proprietary applications, for instance. Ultra-strong copyleft (AGPL) prohibits the remote execution of open software over a network without also making the source code available. This is increasingly relevant in the context of software as a service (SaaS) architectures utilizing browser-based thin clients and similar. Strong copyleft (GPLv2, GPLv2+, GPLv3) fits between these two.13 And weak copyleft (Eclipse 2.0, Mozilla 2.0, EUPL 1.2) sits beneath weaker copyleft because its allows any kind of code integration as long as the copyleft code remains in separate files and is made available on distribution.
The copyleft software licenses were followed by similar licenses for content and then data. The Creative Commons Attribution ShareAlike (CC BY-SA) set of licenses are the best known, with version 4.0 designed for datasets and databases as well. The ODbL, employed by OpenStreetMap, is the most widely used copyleft license specifically crafted for databases.
Most open licenses are now international and intentionally silent on the choice of law (in contrast to their proprietary counterparts).14 This means that litigants are free to select the country and legal system under which they seek redress. As a result, Germany has become, more or less, the jurisdiction of choice for GPL infringement claims [12:234, 244] [61:37]. Such litigation is invariably aimed at enforcement and not relief [61:36]. Claims involving permissive software licenses are rare because the licensing requirements are so lax. But incorrect attribution in other domains, like web publishing, can result in legal action.
It is important to stress that copyleft licenses do not restrict the local use of software, data, or content. As Stallman indicates in relation to software (emphasis added): “the GNU GPL has no substantive requirements about what you do in private, the GPL conditions apply when you make the work available to others” .
Copyleft software licenses were drafted with compiled languages in mind. Some commentators believe new copyleft licenses are needed, better suited to interpreted languages and contemporary programming practices .
6.4. Permissive licenses
The first permissive licenses for software shortly predate the GPL and include the BSD family. The Creative Commons Attribution (CC BY) set of licenses are the most common permissive licenses for content, with version 4.0 intended for datasets and databases as well.
6.5. Public domain
Works residing in the public domain no longer carry exclusive intellectual property rights. These rights might have expired, been forfeited, been expressly waived, or were never applicable. The concept of public domain is a US legal doctrine which has no direct equivalent in Germany and countries with similar civil law traditions. Hence, the CC0 1.0 and PDDL 1.0 public domain dedications fall back to maximally permissive copyright licenses in these jurisdictions [63:4, 11]. These dedications also carry warranty disclaimers similar to those found in open licenses. Under German law, works which falls out of copyright become gemeinfrei and their legal status is being contested by the Wikimedia Foundation as of June 2017 [64:72].
Under US copyright law (17 USC § 105), scientific software and data (among other works) produced (as opposed to contracted) by the US federal government are public domain within the confines of the United States.15 The US government can and does assert copyright to these works in third countries, in accordance with local copyright legislation and established practices [65:3.1.7]. The most visible example of US public domain energy policy software is the National Energy Modeling System (NEMS), which, while freely available and unrestricted in use, makes no attempt to create a community [7:393]. The US Department of Energy OpenEI energy database project, in contrast, serves federal government datasets under an internationally recognized CC0 1.0 public domain dedication .
Although public domain dedications are often made for trivial programs and code snippets, they are rarely used by substantive open source software projects. Public domain dedications are however specifically promoted by some for scientific data because of the flexibility they offer in relation to reuse [67:42]. Whilst noting that information provenance can be more difficult to track in the absence of mandatory attribution.
6.6. License notices
A license notice must be added to the codebase, dataset (as metadata if possible), or document in accordance with the particular license type. Permissive software licenses are normally simpler in this regard, requiring only a single standard text file in the top level directory. Copyleft software licenses, in addition, typically require an abridgment in each source file. The exact insructions for each license type can be found on the web. The FSFE REUSE project provides a standardized way of inserting the information . Meeker [12:148] comments that license notices are not always added, even when the license type is announced on the project web page. Readers need to be alert to this possibility.
6.7. Contribution agreements
Contribution agreements (CA) are used to grant rights from contributors to the project itself [12:196–197]. Such agreements are normally restricted to projects under copyleft licensing and are typically used to provide the flexibility to upgrade to a newer license, relicense under less restrictive conditions, or dual license (covered later) or to manage and enforce license infringements. Contribution agreements effectively transfer trust from the community to the governing body of the project. The FSF employs contribution agreements for all its projects, but the practice is not common. No open energy system project to date employs a contribution agreement.
6.8. European database rights
Another intellectual property right related to but distinct from copyright is a database right [69,70]. A database right protects the financial and professional investment incurred in assembling a public database, but not the individual datasets, which themselves need not reside under copyright [19:15].16 Database rights do not exist in the US because the US Constitutional prevents the protection of uncreative collections . To infringe in Europe, “all or a substantial part” of a database must be downloaded, reconstructed, and used in a way that conflicts with the interests of the original database maker.17 The meaning of “substantial” has yet to be determined through court rulings [19:23]. Under German law (UrhG § 87c), exceptions are made for some forms of scientific research.18 Similar exemptions exist in other EU member states. It remains unclear as to whether the ENTSO-E transparency platform is protected as a database. Jaeger [19:21–24] traverses the question but fails to draw a conclusion.
Third-party database rights apply when stocking from official and semi-official sources in Europe and can therefore prove problematic for database projects with servers also located within Europe (such as the OPSD project). Database rights can create difficulties for modeling projects as well. While modeling projects are unlikely cross the substantial threshold, these same database hosts may be unable to propagate corrections and enhancements to datasets originally obtained from protected databases.
The ODbL, ODC-By, and Creative Commons version 4.0 BY-SA and BY licenses all contain provisions which explicitly address database rights .
6.9. Software patents
Although not a part of copyright law, software patents deserve a brief mention [12:135–182]. Software patents are supported in the US but not to any degree in Europe. Energy system modeling projects must respect third-party software patents. But conflicts are unlikely because the abstract nature of energy system models means they normally fall outside the scope of patentable subject matter.
This article now turns attention towards open code and open data, having covered the motivations for going open and the intellectual property law that applies.
7. Open code
This section starts with definitions. The term code, in this article, refers to text-based source code, whether written in a compiled, interpreted, or translated language. The term covers simple one page scripts to complex codebases comprising tens of thousands of source lines. An executable is a standalone file produced ahead-of-time, which can then be distributed and run on a target system without the original source, although runtime dependencies may still apply.19 The more general term binary is used here to cover executables and compiled libraries, as well as bytecode programs and modules. The term library covers header-only libraries, compiled libraries and their source code, and interpreted language modules, often shipped as source.20 A library normally remains fully separate from the mainline codebase and is not modified. The term software covers all the preceding.
7.1. Software license compatibility
Fig. 3 shows common license compatibilities when utilizing third-party source code and libraries.21 In all cases, the legal conditions imposed by an inbound license cannot be more stringent than those specified by the host license. Identical licenses are naturally compatible. The AGPLv3 license is not fully source compatible with the GPLv3, and when mixed, the source code under each license must remain in separate files . Projects wishing to employ the AGPL should note this limitation. Regarding permissive licenses, the newer ISC license is known for both its simple language and its location near the bottom of the compatibility graph. The ISC does not include a patent grant. The FSF recommends the Apache 2.0 license for small projects (less than 300 source lines) because it can forestall some forms of patent litigation, but (as suggested earlier) such protection seems unnecessary for energy system models.
Libraries can be divided into strong and weak, depending on whether their license forces the client code to match it or not. Proprietary software can always link to a weak library for instance, although the mandatory requirements on distribution will still apply. Conversely, a strong library forces the client codebase to honor the compatibility graph when shipped. Projects can, of course, relicense to match stricter inbound conditions, but all contributors must concur.
GPL licensed code can be built using proprietary tools and then distributed, given that the resulting program does not link to or otherwise combine with non-GPL-compliant software components. An exception, more tightly defined under GPLv3, is made for the system libraries that ship with proprietary operating systems . As indicated earlier, this restriction does not apply to local usage.
7.2. Distribution architectures
The intended mode of distribution can significantly influence software license selection. Fig. 4 depicts the development and distribution architecture for a typical open energy system modeling project utilizing the git revision control system. The inbound licensing conditions apply when one clones the source code and the outbound conditions apply when one further distributes the source code, binaries, or both. To reiterate, the inbound licensing conditions are identical for copyleft and permissive licenses upon primary distribution, an important fact. Under the architecture shown, the user is responsible for ensuring that any and all third-party dependencies, including specialist libraries, are met on their system. Automated workflow methods can ease this overhead [73:3].
A push call by an independent developer results in a local fork being uploaded to the main repository as a uniquely-named development branch. In legal terms and in this context, a local fork constitutes a derived work and a push call constitutes distribution. A push call is normally followed by a pull request, upon receipt of which the project maintainer solicits community testing and discussion and, if successful, merges the submitted changes into the mainline. When one contributes code in this manner, one consents to the current license while simultaneously retaining the copyright for that contribution. Alternatively, if third-party source code is present, then that material must be inbound license compatible with the project and the third-party copyright holders must be duly registered.
Downstream clusters can form, perhaps mapped to individual research groups or sets of developers working on new features. The Linux kernel project uses second tier repositories to manage each of its major subsystems. Not all projects admit the concept of a mainline: one research group (TEMOA) advocates the use of individual research branches without designating a core trunk [ 44:340].
The use of containerization, and particularly Docker, is gaining attention as a method of distributing software without the need for the recipient to manage dependencies. This method can thereby assist with scientific reproducibility . But its possible role in open development is not considered here. Nor are software as a service (SaaS) architectures, which may necessitate an AGPLv3 license, traversed here.
7.3. Software license selection
The choice between copyleft and permissive licensing may ultimately be one of capture versus adoption. Copyleft licenses prevent capture while permissive licenses encourage adoption [50:5]. Casad [57:17] cites the example of BSD Unix and Linux. BSD Unix was able to flourish under the permissive BSD license, thereby providing the context for Linux, which, soon after its inception, swapped to the GPLv2 license in 1991. This new license helped keep the Linux project focused and cohesive, something that the BSD Unix family had lacked. Casad [57:17] surmises that:
The GPL lends itself to large projects that keep the community working together on a single code base. Permissive licenses are better suited for smaller, collaborative projects that serve as a core or incubator for a larger ecosystem that might include proprietary implementations.
In terms of advice, Meeker [12:197] suggests that “if a project is unsure about which license to use, it is best to start with a more restrictive license and move to a more permissive license if it becomes evident that more permissiveness will serve the project better”. It may also be useful to solicit contribution agreements from developers who might later lose contact with the project to facilitate relicensing.
There are two broad considerations when weighing up between copyleft and permissive licensing for an energy system modeling project that utilizes the git-based distribution architecture just described. One is strategic and the other tactical. The first concerns the risk of capture on secondary distribution (bottom of figure) versus an increase in the opportunities for uptake by other projects (not depicted). And the second concerns the licensing relationships with third-party libraries and source code that the project may wish to exploit (left side of figure).
The secondary distribution of executables without source code, legal only under permissive licensing, is (in the author’s view) unlikely to be a common occurrence for open energy system projects aimed at public policy.22 In all probability, the independent developer will hail from a university, research institute, specialist consultancy, in-house corporate team, non-governmental organization (NGO), or public agency. Given the specialized nature of energy modeling, none are likely to have much incentive to develop and distribute software in their own right. Instead, they should be rather more inclined to voluntarily push their improvements upstream for scrutiny, testing, and adoption by the wider community. Hence, the issue of code capture may be essentially irrelevant for most projects. Notwithstanding, project teams concerned about capture should select a copyleft license.
The matter of third-party libraries is more involved. Compiled languages offer header-only and compiled libraries, the latter either built by the user, downloaded as a platform-specific binary, or available through an operating system package. Interpreted languages offer modules in both source code and platform-independent bytecode formats, normally obtained via a language-based package management system.23 Projects wishing to employ third-party libraries should carefully evaluate the license compatibilities and distribution options involved. Libraries carrying an LGPL or lower ranking license can be linked to by all other licenses. But of particular note are the mature numerical and optimization libraries written in C or C++ which carry a GPL license and which necessitate that the project adopt a compatible GPL license in order to co-distribute these libraries in some manner.24 These same libraries are routinely offered by other languages via language bindings and wrapper classes.
The direct use of third-party source code is also language dependent. Copying over source is more likely when projects share a similar design and, of course, the same language. There is a belief that pasting in ten lines of source code or less cannot breach copyright, mentioned here in light of copyleft to permissive transfers. There is no such rule under US law, but a fair use defense may well succeed [12:123].
One further intermediate context should be considered. One project (oemof, written in Python and carrying a GPLv3 license) is explicitly structured as a framework in the computer science sense. The intention is to facilitate component reuse by other open projects. This design is probably best suited to interpreted object-oriented languages. Projects wishing to make use of third-party software modules need to again consider the licensing implications, while noting that a GitHub push call constitutes distribution.
Projects may also dual license. If all contributors agree, a project can assign some or all of its codebase to a different open license in order to service different downstream use cases.25 For completeness, a project can also sell proprietary licenses for inclusion in commercial closed source products [12:193].
Ideally and as noted earlier, a project should finalize its software license after reviewing the license compatibilities of the solvers, mathematical libraries, database systems, and other third-party software components and source code that it plans to use or might possibly wish to employ, together with its intended development and distribution architecture. In this context, the GPLv3 license, which resides at the top of the compatibility graph, although still unreachable from an unqualified GPLv2 license, offers the best general prospect for utilizing third-party inputs. Otherwise, the AGPLv3 license can be selected if captured internet-mediated execution is of concern. Conversely, the ISC and MIT licenses, positioned near the bottom of the compatibility graph, afford the best opportunity for code import and library use by other parties, assuming that code adoption is a strategic goal. If patent grants are of concern, then the Apache 2.0 license should be selected instead in this case.
Model frameworks written in a translated language (GAMS, MathProg) need only carry a permissive license. These projects exist only as source code and cannot be subject to capture. The languages themselves do not support the notion of libraries. Therefore, an ISC or MIT license represents a reasonable choice, because stronger licenses offer no tangible benefit. Even so, code transfers between projects are not especially likely, given the terseness of these languages.
Creative Commons and other non-software licenses are not recommended for software, because only software licenses cover technical matters like linking and patents.
7.4. License adoption by modeling projects
Table 3 shows the adoption of open software licenses by open energy modeling projects. Data processing scripts are not included in the tally. Very little is known as to how and why scientific modeling projects choose their licenses. The breakdown between copyleft and permissive software licenses is split evenly, with the GPLv3, Apache 2.0, and MIT licenses being popular.
Table 3. Software license counts for open energy system modeling projects. Table 1 contains a list of the 28 projects surveyed.
|License family and type||License||Count|
|Permissive licenses||Apache 2.0||5|
|Public domain dedications||CC0 1.0|
|Non-software licenses||CC BY-SA 3.0||2|
7.5. Academic projects
There may be legitimate misgivings when opening up an existing project, particular one hosted by a research institute. The first concerns the intellectual and financial investment in the project to date. Whether to regard that investment as sunk or not is a matter for each team. The second concerns academic reputation. The open source mantra of “release early, release often” [27:28] might not be appropriate: research teams may instead want a degree of finish before publishing their codebase and datasets. The third is a belief that providing support will stretch team resources. Experience suggests otherwise: that although email traffic will increase, the external contributions can easily outweigh this overhead [9:67]. There is, of course, no explicit obligation to support open software and datasets once public. The original team will normally continue to maintain the project, but there is nothing to prevent the codebase from being forked and a new project forming. That said, successful hostile forks are rare.
Energy system models now require too great a level of detail and complexity to be implemented, provisioned, run, and analyzed within a single PhD project. If software is to be developed by a graduate student, then a clear separation from the wider project may be advisable. This can be readily managed under git by creating a local research branch, while periodically merging across improvements from the mainline.
More generally, software developed collectively within an academic context may have to traverse issues that community projects do not. Such issues may include withholding information and ideas, internal and intergroup rivalry, the ownership of names and logos, and project continuity as research projects arise and expire. A neutral code host account may be advisable. Academic norms will also apply. For instance, a failure to cite the author of public domain code does not contravene a legal right, but it may well class as plagiarism.
Traditional open source projects are invariably flatter and more democratic than academic research groups. So it remains to be seen what kind of tensions might arise in terms of governance and what kind of hybrid ethic might emerge as these groups adopt and adapt open development methods .
8. Open data
This section again starts with definitions. The term data refers here primarily to machine-readable datasets. Such datasets may also be human readable if encoded as text and suitably structured. But ultimately these datasets are intended to be machine processed: read into memory by a computer program, cast to native data types, and then manipulated programmatically using integer and floating point arithmetic to derive useful output.
Examples of energy system datasets include asset inventories (constituted as tables), demand, weather, and market state time-series (constituted as arrays), and their associated geolocations (increasingly handled by specialized databases). Data visualization and GIS-based management and interpretation are trending.
Energy system models originally employed structured text files for data interchange . But by the 1990s modelers were considering relational databases for data management . These early efforts however remained local to a project and did not involve internet publishing or open data principles. The first energy sector database project to go public was OpenEI in late-2009 , followed by reegle (after restructuring) in 2011.
Dataset (table or file) versioning is widely used by open energy sector database projects, although just one (OEP) offers (relational) database-wide versioning. Some database projects support the creation of derived datasets using relational database queries: SQL for client-server databases and SPARQL for web databases. Such requests, even when confined to datasets under permissive licensing, can lead to a significant compliance overhead in relation to attribution [12:260].
The crowdsourcing of energy system data is part of the emerging open collaborative research movement known as citizen science . Crowdsourced data brings very different challenges in terms of information integrity, mostly met though ceaseless observation and revision by often anonymous contributors. Crowdsourcing and open development share a similar underlying ethos.
Data privacy has not been a significant issue to date, but may well become so as modelers seek to better represent fine-grained commercial and domestic consumption.
8.1. Data and metadata standards
Technical openness is important for both portability and archiving. Only standardized machine-readable data formats should be employed, using either ASCII or UTF-8 for text encoding.
There are currently few data and metadata standards, recognized or de facto, relevant to energy system datasets. Ludwig Hülk (Reiner Lemoine Institute) is developing a voluntary metadata standard for energy system datasets, leveraging existing open data protocols and employing JSON, a hierarchical human and machine-readable format.26 The standard records the copyright holders and any applicable license, as well as technical attributes and a log of modifications. Metadata also needs to be open licensed to be useful, which raises its own legal issues [63:6–10].
8.2. Data license selection
The open licensing of machine-readable data is a new and burgeoning legal field . There is no explicit legislative support in the US or Germany at least, limited case law, and little robust analysis on which to draw.
Crowdsourced data tends to be collected and released under the ODbL database license, because most such projects also draw from OpenStreetMap. The ODbL is particularly problematic for commercial users, due to its copyleft nature. Some commentators think copyleft licensing may not be a suitable model for most types of open data . The reason being that compound datasets may need to be compiled from diverse sources and that managing the compatibility requirements under such licensing can rapidly prove difficult or impossible. Yet, like software, the outbound licensing requirements for data apply on distribution and not on local usage or modification.
The permissive Creative Commons CC BY 4.0 license has been adopted by number of open energy sector database projects (Energy Research Data Portal for South Africa, energydata. info, OEP, OPSD, SMARD) as their preferred choice. But, like their modeling counterparts, little is known about the reasons behind these selections. Ball reviews the issues involved .
The European Commission Joint Research Centre (JRC) is planning to make part of its Integrated Database of the European Energy Sector (IDEES) public in late-2017 . The database will initially span the years 2000–2018 for all member states. Dataset licensing is to be governed by the JRC policy on data, namely that the “acquisition of data by the JRC from third parties shall, where possible and feasible, be governed by the Open Data principles, and all efforts shall be made to avoid imposition of restrictions to their access and use by the JRC and subsequent users” [80:6]. The Open Data principles however remain silent on the right of public users to distribute original and modified works [80:6]. With regard to Commission-sourced data, some kind of attribution license, perhaps the EU reuse and copyright notice , has been suggested . The Commission needs to finalize which open licenses it intends to use for these datasets. Metadata is to follow the JRC Data Policy Implementation Guidelines but, as of October 2017, these guidelines are not public.
This article represents an attempt to transfer some of the knowledge built up in the open source software world over the past 30 years to a scientific research context. More specifically, knowledge about running complex, geographically distributed software projects that rely on discretionary contributions and negotiated goals. This transfer can be seen as an experiment taking place across all areas of quantitative science. But one that is more sharply delineated for the energy policy modeling domain for at least two reasons. The legal context for data must be properly resolved because much of this data originates from official, semi-official, and commercial sources. And the policy models themselves must become more transparent in order to improve public engagement and ultimately public trust.
Energy system modelers need to be crystal clear on their motivation for opening up their models, or more specifically, their code, data, and documentation. Public transparency poses the lowest disclosure threshold, met, in many cases, by publishing good documentation and the input and output datasets under standard copyright. Supporting publications should not reside behind paywalls and should ideally be open access. Scientific reproducibility requires additionally that the code and data be released under open licenses, so that other researchers can verify the analysis, experiment with the code and data, report their findings, and distribute any modifications. Other researchers can then repeat and extend the process and so on.
Open development means that the core developers wish to build a community of users and contributors. Or at least allow secondary communities to form around a common codebase. It remains to be seen whether the open source development ethos can be successfully ported to an academic context. The crowdsourcing of energy system data for research purposes represents a similar trial. Both activities sit at the intersection between scientific practice and internet-based collaboration. Open development embeds and extends the requirements for both transparency and reproducibility and may ultimately prove a better vehicle for building public acceptance. Open development can potentially contribute to the scientific process through reduced duplication of effort, improved error detection and correction, and easier collaboration within and across research fields.
For the reasons discussed, the choice of software license may have limited effect on the execution of a modeling project. Notwithstanding, technical considerations can be significant, influenced by the implementation language (compiled, interpreted, translated), software dependencies (headers, modules, compiled libraries), opportunities for the inbound and outbound transfer of source code, and the intended mode of distribution (git server, SaaS, containers, other). Projects licensed under strong copyleft will effectively remain in community ownership in perpetuity.
The question of dataset licensing is more difficult. Where possible and appropriate, permissive licenses should be applied to open data to provide flexibility. Public domain dedications place the least encumbrance on users but do little to assist with provenance and integrity. Crowdsourced projects are often required to adopt the copyleft ODbL license because they also draw on information from OpenStreetMap.
The legal status of energy system datasets from official and semi-official sources in Europe needs attention and resolution, particularly in regard to their hosting by open energy database projects. It is essential that such data is able to be curated, stored indefinitely, and used freely for research and policy analysis without database providers and system modelers having to operate, at best, within a legal gray zone.
Conflicts of interest
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Tom Brown, Berit Erlach, Ludwig Hülk, Tim Tröndle, Lukas Wienholt, Frauke Wiese, and Andrew Wilson provided feedback on earlier versions. Traffic on Open Energy Modeling Initiative forums helped raise and clarify key issues. Open licensing was discussed by the open data focus group at the first EMP–E meeting in Brussels in May 2017. Mark Howells and Joe DeCarolis kindly confirmed their motivations for releasing OSeMOSYS and TEMOA respectively. Two unnamed lawyers, both specialist in open licensing, willingly answered my questions on United States and German copyright law. The license compatibility diagram (Fig. 3) was discussed and corrected on the Free Software Foundation Europe (FSFE) Legal Network mailing list. The author is grateful for these contributions and for the valuable comments from two anonymous reviewers.
Acatech Lepoldina, Akademienunion (Eds.), Consulting with Energy Scenarios: Requirements for Scientific Policy Advice, Acatech – National Academy of Science and Engineering, Berlin, Germany (2016)Google ScholarM. Bazilian, A. Rice, J. Rotich, M. Howells, J.F. DeCarolis, S. Macmillan, C. Brooks, F. Bauer, M. LiebreichOpen source software and crowdsourcing for energy analysisEnergy Pol., 49 (2012), pp. 149-153, 10.1016/j.enpol.2012.06.032ArticleDownload PDFView Record in ScopusGoogle ScholarK.-K. Cao, F. Cebulla, J.J. Gómez Vilchez, B. Mousavi, S. PrehoferRaising awareness in model-based energy scenario studies: a transparency checklistEnergy, Sustain. Soc., 6 (1) (2016), pp. 28-47, 10.1186/s13705-016-0090-zCrossRefView Record in ScopusGoogle ScholarJ.F. DeCarolis, K. Hunter, S. SreepathiThe case for repeatable analysis with energy economy optimization modelsEnergy Econ., 34 (2012), pp. 1845-1853, 10.1016/j.eneco.2012.07.004ArticleDownload PDFView Record in ScopusGoogle ScholarR. MorrisonOptimizing Exergy-services Supply Networks for Sustainability — MSc ThesisPhysics Department, University of Otago, Dunedin, New Zealand (2000)Google ScholarR.D. PengReproducible research in computational scienceScience, 334 (6060) (2011), p. 1226, 10.1126/science.1213847CrossRefView Record in ScopusGoogle ScholarS. PfenningerEnergy scientists must show their workingsNature, 542 (2017), p. 393, 10.1038/542393aCrossRefView Record in ScopusGoogle ScholarS. Pfenninger, J.F. DeCarolis, L. Hirth, S. Quoilin, I. StaffellThe importance of open data and software: is energy research lagging behind?Energy Pol., 101 (2017), pp. 211-215, 10.1016/j.enpol.2016.11.046ArticleDownload PDFView Record in ScopusGoogle ScholarS. Pfenninger, L. Hirth, I. Schlecht, E. Schmid, F. Wiese, T. Brown, C. Davis, M. Gidden, H. Heinrichs, C. Heuberger, S. Hilpert, U. Krien, C. Matke, A. Nebel, R. Morrison, B. Müller, G. Pleßmann, M. Reeg, J.C. Richstein, A. Shivakumar, I. Staffell, T. Tröndle, C. WingenbachOpening the black box of energy modelling: strategies and lessons learnedEnergy Strat. Rev., 19 (2017), pp. 63-71, 10.1016/j.esr.2017.12.002Google ScholarF. Wiese, G. Bökenkamp, C. Wingenbach, O. HohmeyerAn open source energy system simulation model as an instrument for public participation in the development of strategies for a sustainable futureWiley Interdisciplinary Reviews: Energy Environ., 3 (5) (2014), pp. 490-504, 10.1002/wene.109CrossRefView Record in ScopusGoogle ScholarS. WilliamsFree as in Freedom (2.0): Richard Stallman and the Free Software Revolution(second ed.), Free Software Foundation (FSF), Boston, Massachusetts, USA (2010)Google ScholarH. MeekerOpen (Source) for Business: a Practical Guide to Open Source Software Licensing(second ed.), CreateSpace Independent Publishing Platform, North Charleston, South Carolina, USA (2017)Google ScholarO. GedenClimate advisers must maintain integrityNature, 521 (2015), pp. 27-28, 10.1038/521027aCrossRefView Record in ScopusGoogle ScholarN. Strachan, B. Fais, H. Daly, Reinventing the energy modelling–policy interface, Nat. Energy 1(16012). doi:10.1038/nenergy.2016.12.Google ScholarS. Pye, C. BatailleImproving deep decarbonization modelling capacity for developed and developing country contextsClim. Pol., 16 (S1) (2016), pp. 27-46, 10.1080/14693062.2016.1173004Google ScholarM. Howells, H.H. Rogner, N. Strachan, C. Heaps, H. Huntington, S. Kypreos, A. Hughes, S. Silveira, J. DeCarolis, M. Bazilian, A. RoehrlOSeMOSYS: the open source energy modeling system: an introduction to its ethos, structure and developmentEnergy Pol., 39 (10) (2011), pp. 5850-5870, 10.1016/j.enpol.2011.06.033ArticleDownload PDFView Record in ScopusGoogle ScholarT. BrucknerDynamische Energie- und Emissionsoptimierung regionaler Energiesysteme — PhD thesisInstitut für Theoretische Physik, Universität Würzburg, Würzburg, Germany (1997)Google ScholarT. BrucknerBenutzerhandbuch Deeco — Version 1.0Institut für Energietechnik, Technische Universität Berlin, Berlin, Germany (2001)Google ScholarT. JaegerLegal Aspects of European Electricity Data — Legal OpinionJBB Rechtsanwälte, Berlin, Germany (2017)Google ScholarEuropean CommissionCommission regulation (EU) No 543/2013 of 14 June 2013 on submission and publication of data in electricity markets and amending Annex I to Regulation (EC) No 714/2009 of the European Parliament and of the CouncilOffi. J. Eur. Union L, 163 (2013), pp. 1-12View Record in ScopusGoogle ScholarEuropean CommissionH2020 Programme: Guidelines to the Rules on Open Access to Scientific Publications and Open Access to Research Data in Horizon 2020 — Version 3.2European Commission Directorate-General for Research and Innovation, Brussels, Belgium (2017)Google ScholarD.C. Ince, L. Hatton, J. Graham-CummingThe case for open computer programsNature, 482 (7386) (2012), pp. 485-488, 10.1038/nature10836CrossRefView Record in ScopusGoogle ScholarV. Stodden, D. Bailey, J. Borwein, R. LeVeque, B. Rider, W. Stein (Eds.), Setting the Default to Reproducible: Reproducibility in Computational and Experimental Mathematics (2013)ICERM Workshop on Reproducibility in Computational and Experimental MathematicsGoogle ScholarR. HatTOSW 0.2.2: the Open Source Way: Creating and Nurturing Communities of ContributorsRed Hat, Raleigh, North Carolina, USA (2009)Google ScholarJ. RifkinThe Zero Marginal Cost SocietyPalgrave Macmillan, New York, USA (2014)Google ScholarK. RaworthDoughnut Economics: Seven Ways to Think like a 21st-century EconomistRandom House, New York, USA (2017)Google ScholarE.S. RaymondThe Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental RevolutionaryO’Reilly Media, Sebastopol, California, USA (2001)Google ScholarJ. MooreRevolution OS — Documentary, Wonderwiew ProductionsWilmington, Delaware, USA (2001)Google ScholarS. BhartiyaWorld domination: an interview with Greg Kroah-HartmanLinux Magazine (193) (2016), pp. 14-16View Record in ScopusGoogle ScholarF. Gardumi, A. Shivakumar, M. Howells, Y. Almulla, C. Taliotis, V. Sridharan, E. Ramos, A. Beltramo, M. Welsch, R. Morrison, J. Hörsch, T. Niet, T. Burandt, G. P. Balderrama, O. Broad, E. Zepeda, T. Alfstad, From the development of an open-source energy modelling tool to its application and the creation of communities of practice: the example of OSeMOSYS, Energy Strategy Reviews Under review.Google ScholarC. Morris, A. JungjohannEnergy Democracy: Germany’s Energiewende to RenewablesPalgrave Macmillan, London, United Kingdom (2016), 10.1007/978-3-319-31891-2Google ScholarC. Dieckhoff, E. PissarskoiCredible worlds, possibilistic foreknowledge and policy decisions: a case study on energy scenariosINEM 2011 — IX Conference of the International Network for Economic Method (2011)Google ScholarT. BrucknerDecarbonizing the global energy system: an updated summary of the IPCC report on mitigating climate changeEnergy Technol., 4 (1) (2016), pp. 19-30, 10.1002/ente.201500387CrossRefView Record in ScopusGoogle ScholarC. WoolstonScientists are cautious about public outreachNature, 518 (7540) (2015), p. 459, 10.1038/518459fCrossRefView Record in ScopusGoogle ScholarK. RamGit can facilitate greater reproducibility and increased transparency in scienceSource Code Biol. Med., 8 (1) (2013), p. 7, 10.1186/1751-0473-8-7View Record in ScopusGoogle ScholarGitHubOpen Source License Usage on GitHub.ComGitHub, San Francisco, California, USA (2015)Google ScholarW. Medjroubi, U.P. Müller, M. Scharf, C. Matke, D. KleinhansOpen data in power grid modelling: new approaches towards transparent grid modelsEnergy Rep., 3 (2017), pp. 14-21, 10.1016/j.egyr.2016.12.001ArticleDownload PDFView Record in ScopusGoogle ScholarJ. Rivera, J. Leimhofer, H.-A. JacobsenOpenGridMap: towards automatic power grid simulation model generation from crowdsourced dataComput. Sci. Res. Dev., 32 (1) (2017), pp. 13-23, 10.1007/s00450-016-0317-4CrossRefView Record in ScopusGoogle ScholarC. DavisMaking Sense of Open Data: from Raw Data to Actionable Insight — PhD ThesisDelft University of Technology, Delft, The Netherlands (2012)Google ScholarS. Pfenninger, I. StaffellLong-term patterns of European PV output using 30 years of validated hourly reanalysis and satellite dataEnergy, 114 (2016), pp. 1251-1265, 10.1016/j.energy.2016.08.060ArticleDownload PDFView Record in ScopusGoogle ScholarI. Staffell, S. PfenningerUsing bias-corrected reanalysis to simulate current and future wind power outputEnergy, 114 (2016), pp. 1224-1239, 10.1016/j.energy.2016.08.068ArticleDownload PDFView Record in ScopusGoogle ScholarM. Howells, H.H. Rogner, I. Jalal, M. IsshikiAn Open Source Energy Planning Approach: SOFT-MESSAGE — Presentation(2008)(SOFT-MESSAGE was later renamed OSeMOSYS)Google ScholarJ.F. DeCarolis, K. Hunter, S. SreepathiThe TEMOA Project: Tools for Energy Model Optimization and AnalysisDepartment of Civil, Construction, and Environmental Engineering, North Carolina State University, Raleigh, North Carolina, USA (2010)Google ScholarK. Hunter, S. Sreepathi, J.F. DeCarolisModeling for insight using tools for energy model optimization and analysis (Temoa)Energy Econ., 40 (2013), pp. 339-349, 10.1016/j.eneco.2013.07.014ArticleDownload PDFView Record in ScopusGoogle ScholarB. RoseClean Electricity Western Australia 2030: Modelling Renewable Energy Scenarios for the South West Integrated System, Sustainable Energy Now(2016)West Perth, WA, AustraliaGoogle ScholarS. Quoilin, I. Hidalgo González, A. ZuckerModelling Future EU Power Systems under High Shares of Renewables: the Dispa-SET 2.1 Open-source Model — EUR 28427 ENPublications Office of the European Union, Luxembourg (2017), 10.2760/25400Google ScholarJ. Kitzes, D. Turek, F. Deniz (Eds.), The Practice of Reproducible Research: Case Studies and Lessons from the Data-intensive Sciences, University of California Press, Oakland, California, USA (2017)Google ScholarLegal Information Institute17 USC — Copyrights. The United States Copyright StatuteLegal Information Institute, Ithaca, New York, USA (2017)Google ScholarJurisAct on Copyright and Related Rights (Urheberrechtsgesetz, UrhG) — Amendments to 20 December 2016 — Official Translation, Juris, Saarbrücken, Germany, 2017. This version lacks the revisions enacted on(30 June 2017)Google ScholarA. Morin, J. Urban, P. SlizA quick guide to software licensing for the scientist-programmerPLoS Comput. Biol., 8 (7) (2012), Article e1002598, 10.1371/journal.pcbi.1002598CrossRefGoogle ScholarT. Jaeger, A. MetzgerOpen Source Software: Rechtliche Rahmenbedingungen der Freien Software(fourth ed.), CH Beck, Munich, Germany (2016)Google ScholarJ. GrimmelmannCopyright for literate robotsIowa Law Rev., 101 (2) (2016), pp. 657-681View Record in ScopusGoogle ScholarJ.D. Wren404 not found: the stability and persistence of URLs published in MEDLINEBioinformatics, 20 (5) (2004), pp. 668-672, 10.1093/bioinformatics/btg465View Record in ScopusGoogle ScholarP. SamuelsonCopyright’s fair use doctrine and digital dataPublish. Res. Q., 11 (1) (1995), pp. 27-39, 10.1007/BF02680415View Record in ScopusGoogle ScholarR. Fontana, B.M. Kuhn, E. Moglen, M. Norwood, D.B. Ravicher, K. Sandler, J. Vasile, A. WilliamsonA Legal Issues Primer for Open Source and Free Software Projects — Version 1.5.2Software Freedom Law Center, New York, USA (2008)Google ScholarCreative Commons, Data, Creative Commons, Mountain View, California, USA (2013)Google ScholarJ. CasadCopyleft: the GPL and the birth of a revolutionLinux Magazine (200) (2017), pp. 14-18View Record in ScopusGoogle ScholarGNUFrequently Asked Questions about the GNU Licenses, GNU Project, Boston, Massachusetts, USA(2017)Webpage version 2017-06-09T15:22:31Google ScholarB.M. Kuhn, A.K. Sebro Jr., D. GingerichCopyleft and the GNU General Public License: a Comprehensive Tutorial and Guide(2015)Google ScholarA. WilsonPersonal Comment(2 November 2017)Google ScholarT. JaegerEnforcement of the GNU GPL in Germany and Europe, journal of intellectual propertyInformation Technology and E-Commerce Law (JIPITEC), 1 (1) (2010), pp. 34-39View Record in ScopusGoogle ScholarR. StallmanInterpreting, Enforcing and Changing the GNU GPL, as Applied to Combining Linux and ZFSFree Software Foundation (FSF), Boston, Massachusetts, USA (2016)Google ScholarT. KreutzerValidity of the Creative Commons Zero 1.0 Universal Public Domain Dedication and its Usability for Bibliographic Metadata from the Perspective of German Copyright LawBüro für Informationsrechtliche Expertise, Berlin, Germany (2011)Google ScholarU. BantleNeue Urteile zum UrheberrechtLinux-Magazin (08/17) (2017), pp. 72-73View Record in ScopusGoogle ScholarB. Klein, G. Hodge (Eds.), Frequently Asked Questions about Copyright, CENDI Secretariat, Information International Associates, Oak Ridge, Tennessee, USA (2008)Google ScholarD. Brodt-GilesWREF 2012: OpenEI: an Open Energy Data and Information Exchange for International AudiencesNational Renewable Energy Laboratory (NREL), Golden, Colorado, USA (2012)Google ScholarV. StoddenEnabling Reproducible Research: Open Licensing for Scientific Innovation(2009)Google ScholarFSFEREUSE Project DocumentationFree Software Foundation Europe (FSFE), Berlin, Germany (2017)Google ScholarEuropean Parliament and European CouncilDirective 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databasesOffi. J. Eur. Union L, 77 (1996), pp. 20-28Google ScholarX. Wu, EC database directive, Berk. Technol. Law J. 17(1), Article 13. doi:10.15779/Z38VH5D.Google ScholarR.P. MergesOne hundred years of solicitude: intellectual property law, 1900–2000Calif. Law Rev., 88 (6) (2000), pp. 2187-2240, 10.2307/3481215CrossRefView Record in ScopusGoogle ScholarD. WheelerThe Free-libre/Open Source Software (FLOSS) License Slide(2007)Google ScholarC. BoettigerAn introduction to Docker for reproducible research, with examples from the R environmentACM SIGOPS – Oper. Syst. Rev., 49 (1) (2015), pp. 71-79, 10.1145/2723872.2723882CrossRefView Record in ScopusGoogle ScholarJ. BaconThe Art of Community: Building the New Age of Participation(second ed.), O’Reilly Media, Sebastopol, California, USA (2012)Google ScholarH.-M. GroscurthDesign and management of energy databasesEnergy Sources, 17 (4) (1995), pp. 445-457, 10.1080/00908319508946093CrossRefView Record in ScopusGoogle ScholarC. Franzoni, H. SauermannCrowd science: the organization of scientific research in open collaborative projectsRes. Pol., 43 (1) (2014), pp. 1-20, 10.1016/j.respol.2013.07.005ArticleDownload PDFView Record in ScopusGoogle ScholarH. MeekerPersonal Comment(22 July 2017)Google ScholarA. BallHow to License Research DataDigital Curation Centre (DCC), Edinburgh, United Kingdom (2014)Google ScholarT. WiesenthalPOTEnCIA and JRC-IDEES: a New Modelling Toolset for the European Energy Sector — Presentation, EMP–E Meeting(2017)Brussels, BelgiumGoogle ScholarC. Doldirina, A. Friis-Christensen, N. Ostlaender, A. Perego, A. Annoni, I. Kanellopoulos, M. Craglia, L. Vaccari, G. Tartaglia, F. Bonato, P. Triaille Jean, S. GentileJRC Data Policy — Report EUR 27163 ENPublications Office of the European Union, Luxembourg (2015), 10.2788/607378Google ScholarEuropean CommissionCommission decision of 12 December 2011 on the reuse of Commission documents — 2011/833/EUOffi. J. Eur. Union L, 330 (2011), pp. 39-42View Record in ScopusGoogle ScholarA. ZuckerData Openness in JRC Models — Presentation, EMP–E Meeting(2017)Brussels, BelgiumGoogle Scholar1
MathProg is an open language from the GNU GLPK project that supports a subset of the proprietary AMPL language.2
Regulation 543/2013  states (3 § 1): “The data shall be up to date, easily accessible, downloadable and available for at least five years. Data updates shall be time-stamped, archived and made available to the public.”3
The GAMS Development Corporation has indicated it will support genuine community projects on a case-by-case basis.4
GitHub is a US-based code hosting website using the git distributed revision control system as its backend. Other well known platforms include GNU Savannah, SourceForge, and GitLab.5
Balmorel was initially released under standard copyright and belatedly added an ISC license in 2017. deeco was distributed with a GPLv2+ license but retired in 2005 when key closed source programming libraries lost vendor support. SOFT-MESSAGE, see Ref. , added an Apache 2.0 license in 2009 or thereabouts and was later renamed OSeMOSYS.6
The copyright of digital data varies across jurisdictions, as well as being a rapidly evolving area of law. The line between unprotected facts and copyrightable material in continental Europe is somewhat different from that stated above.7
This version does not include revisions made on 30 June 2017. These changes primarily address text and data mining and are not especially relevant here [19:44–45]. But new provisions that relax the use of public databases for scientific research are of interest [19:43–44].8
A “computer program” is accorded a wide definition under UrhG § 69a (1) and (2) and covers source code and binaries as well as “preparatory design material”. In line with the general precepts of copyright, subsection (2) also confirms that the “ideas and principles which underlie any element of a computer program, including the ideas and principles which underlie its interfaces” cannot be protected.9
Jaeger and Metzger [51:129] write, in relation to executable software, (translation confirmed by Till Jaeger, emphasis added): “The UrhG § 69c (1) assumes a broad concept for copying which includes not only a permanent copy on a storage medium, but also the temporary loading into main (RAM) memory or processor cache. This leads to the conclusion that a copyright permission is required for the mere use of a piece of software. Thus, the construction of UrhG § 69a and following sections differs from classical copyright. Anyone who uses an analog work as intended does not require permission from the author and in particular no rights of use: reading a novel, listening to music, or viewing an artwork are not processes which can be prohibited by exclusive copyrights.”10
Grimmelmann  contends that the machine reading of copyrighted text by robots does not infringe US copyright law. But it is not clear how the facts he presents might relate here.11
More specifically, UrhG § 52a (2) 2 states in relation to content that “it shall be permissible [to publish] limited parts of a work, small scale works, as well as individual articles from newspapers or periodicals.12
In relation to the open licensing of software, readers fluent in German are referred to Jaeger and Metzger  regarding German and European law. Meeker  provides an excellent treatment in English with a focus on US law, but also reviews the case law developing internationally, including in Germany. Neither work traverses open data.13
The GPLv1 would now classify as a weak copyleft license and its abuse by commercial developers lead to the strengthened GPLv2 license in 1991.14
One notable exception is the EUPL, which specifies that European law and courts are to be used unless otherwise agreed.15
More specifically , 17 USC § 101 limits this provision to “a work prepared by an officer or employee of the United States Government as part of that person’s official duties”.16
More specifically , directive 96/9/EC 7 § 1 protects a database in which “there has been qualitatively and/or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents”.17
Directive 96/9/EC 7 § 2 (b) .18
More specifically, UrhG § 87c (1) states (30 June 2017 revision) that a “substantial part of a database” may be reproduced and used “for the purpose of scientific research according to sections 60c and 60d”. And § 60c (2) permits (emphasis added) the reproduction of up to 75% of a database for non-commercial or commercial “personal research” [19:43–44].19
This definition covers compiled languages like C and C++ which produce machine code for later execution. But also, for the purposes of this discussion, to interpreted languages like Python which can be compiled ahead-of-time to bytecode for later interpretation and execution. In this case, a suitable interpreter must be present on the host system. In practice, it is not common to distribute Python programs in this manner, if only by custom. In contrast, Java is usually compiled to platform-independent bytecode for circulation, to be later run on a local Java virtual machine (JVM). Java is therefore treated here as a compiled language due to this mode of distribution.20
Header-only libraries are restricted to C++ and are implemented using template metaprogramming.21
The diagram is a directed acyclic graph (DAG), but without a complete transitive reduction for reasons of intuitiveness and readability. Licenses which are legally equivalent share the same vertex.22
Electricity sector models with sophisticated power flow algorithms might present an exception. Although there is nothing to stop a closed source developer from reimplementing these routines where a copyleft license prevents their direct inclusion. The source code and any documentation are, after all, public knowledge.23
The Python Software Foundation (PSF) license for Python 2.0.1 and above is inbound compatible with the GPLv2 license and above.24
For example, the GLPK MILP solver embeds a GPLv3 license and requires a GPLv3 or AGPLv3 client if shipped, whereas the COIN-OR CBC MILP solver carries the permissive Eclipse license and can be bundled by software under any open license and also by proprietary software.25
PyPSA, for instance, dual licenses under the GPLv3 and Apache 2.0 licenses [9:65].26
https://forum.openmod-initiative.org/t/breakout-group-about-metadata.© 2018 The Author. Published by Elsevier Ltd.