Best practice for extracting license text from artifacts
The artifactory license API helps me identify what type of license an artifact uses (MIT, GPL, etc.), but doesn't appear to provide a way to extract the precise text of the artifact's license. In other words, if I use Alice's foo library under an MIT license, that almost certainly means I have to include the real text of the license (like "(c) 2015 Alice...") somewhere in my documentation.
I want to automate the generation of a report for the closure of all my transitive dependencies, like this:
foo version 1.0.1:
(c) 2015 Alice
bar version 0.9.3:
(c) 2013-2014 Bob
What's the best practice for extracting the license text from an arbitrary artifact? For example, if the artifact is a zip/tar, what's the best way to leverage Artifactory to search for a LICENSE/LICENCE file in the archive and give me the text? Given that libraries aren't consistent in how they include their own license text, is there some standard metadata in Artifactory to indicate the full text of the library's license?
Right now, the approach I'm considering is to manually scour the artifact, find its license text, add that text as a string property (maybe called "custom.licenseText"?) on the artifact in Artifactory. At build time, then I can query Artifactory for the value of "custom.licenseTest" for each dependency and concatenate all those value together.
Is there an easier way to do this, or is this on Artifactory's roadmap?
Re: Best practice for extracting license text from artifacts
The Archive Entry Search REST API can be used for searching for license files inside archives.
Artifactory indexes by default the content various types of archives. The types of indexed archive files can be found in <ARTIFACTORY_HOME>/etc/mimetypes.xml.
For example the following request will look for all files starting with LICENSE within indexed archives: