Libraries.io already has support for most of the largest package managers but there are many more that we've not added yet. This guide will take you through the steps for adding support for another.
Adding support for a new package manager is fairly easy assuming that the package manager repository has an API for extracting data about its packages over the internet. Follow these steps:
Add new file to app/models/package_manager, this will be a ruby class so the filename should be all lower case and end in .rb, for example: app/models/package_manager/foobar.rb
The basic structure of the class should look like this:
module PackageManager
class Foobar < Base
end
endNote that the class name must begin with a capital letter and only contain letters, numbers and underscores, ideally the class name will match the formatting of the package managers official name, i.e. CocoaPods
There are three basic methods that each package manager class needs to implement to enable minimal support in Libraries.io:
Libraries needs to know all of the names of the projects available in a package manager to be able to index them, this method should return an array of strings of names.
Different package managers provide ways of getting this data, here are some examples:
- npm provides one huge json endpoint containing all the packages, we pluck just the keys from the top level object in the response:
def self.project_names
get("https://registry.npmjs.org/-/all").keys[1..-1]
end- Haxelib lists all the project names on a html page, so we use nokogiri to pluck them all out:
def self.project_names
get_html("https://lib.haxe.org/all/").css('.project-list tbody th').map{|th| th.css('a').first.try(:text) }
end- Julia stores all the packages in a git repository, here we clone the repo, list the top level folder names, not ideal but it works:
def self.project_names
@project_names ||= `rm -rf Specs;git clone https://github.com/JuliaLang/METADATA.jl --depth 1; ls METADATA.jl`.split("\n")
endOnce we have a list of package names, we need to be able to get the information for each package by its name from the registry. This is also used for syncing/updating a package we already know about when a new version is published.
This method takes a string of the name as an argument and usually makes a http request to the registry for the given name and returns a ruby hash of information, often parsed from json or xml.
Some examples:
- Packagist has a JSON endpoint and we select just the
packageattribute from the response:
def self.project(name)
get("https://packagist.org/packages/#{name}.json")['package']
end- npm has a JSON endpoint but we need to escape
/for scoped module names:
def self.project(name)
get("http://registry.npmjs.org/#{name.gsub('/', '%2F')}")
end- Hackage doesn't have a JSON endpoint for package information so we scrape the html of the page instead:
def self.project(name)
{
name: name,
page: get_html("http://hackage.haskell.org/package/#{name}")
}
endAfter getting the information about a package from the registry, we need to format that data into something that will fit nicely in the Libraries.io database, the mapping method takes the result of the #project method and returns a hash with some or all of the following keys:
name- The name of the project, this is usually the same as originally passed to#projectdescription- description of the project, usually a couple of paragraphs, not the whole readmerepository_url- url where the source code for the project is hosted, often a GitHub, GitLab or Bitbucket repo pagehomepage- url for the homepage of the project if different from therepository_urllicenses- an array of SPDX license short names that the project is licensed under, eg['MIT', 'GPL-2.0']keywords_array- an array of keywords or tags that can be used to categorize the project
Here's an example from Cargo:
def self.mapping(raw_project)
MappingBuilder.build_hash({
name: raw_project['crate']['id'],
homepage: raw_project['crate']['homepage'],
description: raw_project['crate']['description'],
keywords_array: Array.wrap(raw_project['crate']['keywords']),
licenses: raw_project['crate']['license'],
repository_url: repo_fallback(raw_project['crate']['repository'], raw_project['crate']['homepage'])
})
endNot all package managers have these concepts but lots do, more features in Libraries.io can be enabled if these methods are implemented in a PackageManager class:
For package managers that have a concept of discrete versions being published.
This method takes the returned data from the #project method and should return an array of hashes, one for each version, with a number and the date that the version was originally published_at.
Here's an example from NuGet:
def self.versions(raw_project, _name)
raw_project[:releases].map do |item|
VersionBuilder.build_hash(
number: item['catalogEntry']['version'],
published_at: item['catalogEntry']['published']
)
end
endFor package managers that we can update using a single version instead of all versions.
This method should take the returned data from the #project method and should return a single version, with the same data
that versions() returns.
def self.one_version(raw_project, version_string)
raw_project["versions"]
.find { |v| v["number"] == version_string }
.map do |item|
number: item["number"],
published_at: item["published"]
end
endFor package managers that have a concept of versions and versions having dependencies.
This method returns the dependencies for a particular version of a package, so it receives a name, version and optionally the returned data from the #project method and should return an array of hashes, one for each dependency.
Each dependency hash should include the following attributes:
project_name- the name of the package of the dependencyrequirements- the version requirements of this dependency, for example~> 2.0kind- regular dependencies areruntimebut this could also bedevelopment,test,buildor something else
The can also potentially have extra attributes:
optional- some package managers have the concept of optional dependencies, if yours does, set this as a booleanplatform- this will almost always beself.name.demodulize, the same platform as the package manager, but if dependencies come from a different package manager you can override it
Example from Haxelib:
def self.dependencies(name, version, _mapped_project)
json = get_json("https://lib.haxe.org/p/#{name}/#{version}/raw-files/haxelib.json")
return [] unless json['dependencies']
json['dependencies'].map do |dep_name, dep_version|
{
project_name: dep_name,
requirements: dep_version.empty? ? '*' : dep_version,
kind: 'runtime',
platform: self.name.demodulize
}
end
rescue
[]
endFor package managers with a lot of packages, downloading the full list of names can take a long time. If you can provide a list of names of recently added/updated packages then Libraries.io can check that on a more regular basis. It should return a list of names in the same way that #project_names does, for example:
- Pub's project list page is ordered by most recently updated so we can just grab the first page of packages and map the names out:
def self.recent_names
get("https://pub.dartlang.org/api/packages?page=1")['packages'].map{|project| project['name'] }
endMany package managers have a command line interface for installing individual packages, if you add this method, Libraries.io will show the instructions on the project page so anyone can easily install it.
This method is passed a project object and optionally a version number, here's some examples:
- Rubygems adds a
-vflag if a version is passed
def self.install_instructions(db_project, version = nil)
"gem install #{db_project.name}" + (version ? " -v #{version}" : "")
end- Go cli doesn't have support for specifying a version so it's ignored
def self.install_instructions(db_project, version = nil)
"go get #{db_project.name}"
endIf the package manager's official name doesn't fit with Ruby's class name rules you can add its official name in this method, for example npm is always lower case, the class name is NPM so we have added the following:
def self.formatted_name
'npm'
endIf the package manager registry has a predictable url structure, we can generate useful urls for each project that are used where available:
If the package manager registry website has individual pages for each package, add this method to return a url for it.
It takes a project object and an optional version number, for example:
def self.package_link(db_project, version = nil)
"https://rubygems.org/gems/#{db_project.name}" + (version ? "/versions/#{version}" : "")
endIf the package manager provides predictable urls to the tar ball or zip archive of the package, add this method to return a url for it.
It takes a package name and an optional version number, for example:
def self.download_url(db_project, version = nil)
"https://rubygems.org/downloads/#{db_project.name}-#{version}.gem"
endIf the package manager provides hosted documentation for each package, add this method to return a url for it.
It takes a package name and an optional version number, for example:
def self.documentation_url(name, version = nil)
"http://www.rubydoc.info/gems/#{name}/#{version}"
endLibraries will try and ping the #package_link url on a regular basis to check for a 200 status code, if the package manager registry always returns a 200 or doesn't have a #package_link method, you can add this method to provide a different url that will return a 200 if the package still exists or a 404 if it's been removed.
It takes a project object, for example:
def self.check_status_url(db_project)
"https://rubygems.org/api/v1/versions/#{db_project.name}"
endConstants are added to each PackageManager to provide more meta data about the level of support that Libraries.io has for that package manager:
If the PackageManager class has a #versions method then set this to true:
HAS_VERSIONS = trueIf the PackageManager class has a #dependencies method then set this to true:
HAS_DEPENDENCIES = trueIf the package manager has a website then set this to the full url with protocol:
URL = 'https://rubygems.org'Most application level package managers have a main programming language that they focus on, this should be set to the hex value for that language from the github-linguist gem, you can see the full list of colours in languages.yml
COLOR = '#701516'HIDDEN
This doesn't need to be set for any active package managers, but if one is shut down and should no longer be shown on the site set it to true:
HIDDEN = trueOnce your PackageManager class is ready you can add the required rake tasks to download.rake
Depending on the size, popularity and frequency of updates there are different tasks to add:
If there's a #recent_names method defined on the PackageManager class then Libraries.io can check for new updates frequently by calling #import_recent_async on the class, add a rake task that looks like this:
desc 'Download recent Rubygems packages asynchronously'
task rubygems: :environment do
PackageManager::Rubygems.import_recent_async
endFor package managers that don't have a proper concept of versions (Go and Bower are good examples that fall back to git tags), we don't need to check packages we already know about, the #import_new_async task will only download packages we don't already have in the database:
desc 'Download new Bower packages asynchronously'
task bower: :environment do
PackageManager::Bower.import_new_async
endFor the initial import of all packages, add an foobar_all task which calls #import_async, this will be ran on a daily basis if there's no #recent_names method defined:
desc 'Download all Rubygems packages asynchronously'
task rubygems_all: :environment do
PackageManager::Rubygems.import_async
endFor some package managers that the download process can't easily be parallelized (if it requires cloning a git repo for example), the import can be done synchronously instead with the following task that calls #import on the class:
desc 'Download all Inqlude packages'
task inqlude: :environment do
PackageManager::Inqlude.import
endOnce the PackageManager class is ready, there's some optional updates that can be added to some other repositories to enable more functionality.
Depper polls RSS feeds and JSON API endpoints to check for new and updated packages and then enqueues jobs to download those packages. It helps reduce the load on the package manager registries and push new data into the system faster. You will want to add a new ingestor that understands how to track changes in the package manager.
If your package manager has an icon, adding it to the Pictogram repository will enable it to show up on the site.
Check out the documentation on adding a logo for a new package manager in the Pictogram repo: https://github.com/librariesio/pictogram