Best practice for extracting license text from artifacts

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view

Best practice for extracting license text from artifacts

The artifactory license API helps me identify what type of license an artifact uses (MIT, GPL, etc.), but doesn't appear to provide a way to extract the precise text of the artifact's license.  In other words, if I use Alice's foo library under an MIT license, that almost certainly means I have to include the real text of the license (like "(c) 2015 Alice...") somewhere in my documentation.

I want to automate the generation of a report for the closure of all my transitive dependencies, like this:

foo version 1.0.1:
(c) 2015 Alice
MIT ...

bar version 0.9.3:
(c) 2013-2014 Bob
BSD ...

What's the best practice for extracting the license text from an arbitrary artifact?  For example, if the artifact is a zip/tar, what's the best way to leverage Artifactory to search for a LICENSE/LICENCE file in the archive and give me the text?  Given that libraries aren't consistent in how they include their own license text, is there some standard metadata in Artifactory to indicate the full text of the library's license?

Right now, the approach I'm considering is to manually scour the artifact, find its license text, add that text as a string property (maybe called "custom.licenseText"?) on the artifact in Artifactory.  At build time, then I can query Artifactory for the value of "custom.licenseTest" for each dependency and concatenate all those value together.

Is there an easier way to do this, or is this on Artifactory's roadmap?
Reply | Threaded
Open this post in threaded view

Re: Best practice for extracting license text from artifacts

The Archive Entry Search REST API can be used for searching for license files inside archives.
Artifactory indexes by default the content various types of archives. The types of indexed archive files can be found in <ARTIFACTORY_HOME>/etc/mimetypes.xml.
For example the following request will look for all files starting with LICENSE within indexed archives:

curl -uadmin:password http://localhost:8081/artifactory/api/search/archive?name=LICENSE*

An example result for this query would be:

    "entry" : "./LICENSE.txt",
    "archiveUris" : [ "http://localhost:8081/artifactory/api/storage/jcenter-cache/junit/junit/4.7/junit-4.7.jar" ]

Based on the result you can download the license file content and analyze it, for example:

As a side note, I can say that this is an interesting use case and we are thinking about ways to support it in our future roadmap.