Documentation
class jadbio.client.JadbioClient(username: str, password: str, host: str | None = None)
This class provides major JADBio functionality to python users using API calls. Requests are HTTP GET and POST only. POST requests are used for any kind of resource creation, mutation, or deletion. GET requests are read-only and idempotent.
_init_(username: str, password: str, host: str | None = None)
Handles communication with the backend.
-
Parameters
-
username (str) – Jad account username or email.
-
password (str) – Jad account password.
-
host (str) – Host endpoint (no need to be set).
-
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
get_version()
Returns the full version number of the currently deployed API.
-
Returns
public api version
-
Return type
str
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> print(client.get_version())
1.0-beta
login(username: str, password: str)
Login user, and store credentials in current session (previous credentials if any, are overwritten). Creates a session in the current client instance.
-
Parameters
-
username (str) – Jad account username or email.
-
password (str) – Jad account password.
-
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.login('[email protected]', 'another password')
logout()
Logout. Removes credentials from session. :raises RequestFailed, JadRequestResponseError: Exception in case sth goes wrong with a request.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.logout()
create_project(name: str, descr: str = '')
Creates a new project.
-
Parameters
-
name (str) – Project name.
-
descr (str) – Project description.
-
-
Returns
projectID
-
Return type
str
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.create_project("this_is_a_project_name")
'314'
get_project(project_id: str)
Returns a project.
-
Parameters
project_id (str) – The id of the project. The user must have read permissions to the project.
-
Returns
{ projectId: string, name: string, description?: string }
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_project("314")
{ projectId: '314', name: 'this_is_a_project_name' }
get_projects(offset: int = 0, count: int = 10)
Returns a sublist of projects of a user.
-
Parameters
-
offset (int) – Project list offset. The first element starts at offset 0.
-
count (int) – Limit on the number of projects to return. No limit is applied if count is negative.
-
-
Returns
{offset: number, totalCount: number, data: [{ projectId: string, name: string, description?: string }]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
Constraints: Offset must be a non-negative integer.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_projects()
{'offset': 0, 'totalCount': 1,
'data': [{'projectId': '462', 'name': 'test'}]}
delete_project(project_id: str)
Allows clients to delete a specified project. Beware This operation silently deletes all contained datasets, analyses and models used inside that project.
-
Parameters
project_id (str) – The id of the project. It must be owned by the user.
-
Returns
{ projectId: string, name: string, description?: string }
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.delete_project(client.create_project("this_is_a_project_name"))
{'projectId': '463', 'name': 'this_is_a_project_name'}
upload_file(file_id: int, pth_to_file: str)
Allows clients to upload files to be analyzed.
-
Parameters
-
file_id (int) – The file_id is an alpha-numeric identifier provided by the client. Reusing the file_id will overwrite the uploaded file. The file_id must be specified in subsequent requests to create a dataset from the raw uploaded file. The file to upload is provided directly as the body of the request.
-
pth_to_file (str) – path/to/file to be uploaded.
-
-
Returns
file_id
-
Return type
int
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.upload_file(1234, "pth/to/file.csv")
1234
create_dataset(name: str, project_id: str, file_id: int, file_size_in_bytes: int, separator: str = ',', has_samples_in_rows: bool = True, has_feature_headers_name: bool = True, has_sample_headers: bool = True, description: str = '')
Create a dataset from an uploaded file.
-
Parameters
-
name (str) – Name of the dataset (must have at least 3 and at most 60 characters and must be unique within the target project).
-
project_id (str) – The id of the project. The dataset will be added to this project. The user must have write permissions to the project.
-
file_id (int) – Id provided by the client when the file was uploaded.
-
file_size_in_bytes (int) – must match the actual size of the uploaded file. (e.g. len(open(“pth/to/file.csv”,’r’).read()))
-
separator (str) – specifies the characters used to separate values in the file.
-
has_samples_in_rows (bool) – must be true iff rows of the uploaded file correspond to samples.
-
has_feature_headers_name (bool) – True if dataset contains feature names.
-
has_sample_headers (bool) – True if dataset contains sample names.
-
description (str) – Description of created dataset (can be at most 255 characters).
-
-
Returns
taskId
-
Return type
str
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> from os.path import getsize
>>> client = JadbioClient('[email protected]', 'a password')
>>> pid = client.create_project("this_is_a_project_name")
>>> fid = client.upload_file(1234, "pth/to/file.csv")
>>> client.create_dataset('file', pid, fid,
... len(open("pth/to/file.csv",'r').read()), has_sample_headers=False)
'689'
get_dataset(dataset_id: str)
Returns details of a specific dataset.
-
Parameters
dataset_id (str) – Identifies the dataset which must belong to a project to which the user must have read access.
-
Returns
{datasetId: string, projectId: string, name: string, description?: string, sampleCount: number, featureCount: number, sizeInBytes: number}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_dataset('6065')
{'projectId': '462', 'datasetId': '6075', 'name': 'datasetName',
'sampleCount': 150, 'featureCount': 6, 'sizeInBytes': 4507}
attach_dataset(name: str, project_id: str, dataset_id: str)
Attach a dataset from another project, to a destination project.
-
Parameters
-
name (str) – Used to name the new dataset. It must have at least 3 and at most 60 characters and must be unique within the target project.
-
project_id (str) – The id of the project. The dataset will be attached to this project. The user must have write permissions to the project.
-
dataset_id (str) – Identifies the source dataset. It must belong to a project to which the user has read permissions.
-
-
Returns
{datasetId: string, projectId: string, name: string, description?: string, sampleCount: number, featureCount: number, sizeInBytes: number}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.attach_dataset('name_of_attached_dataset', '462','6065')
{'projectId': '462', 'datasetId': '6065', 'name': 'a new name',
'sampleCount': 150, 'featureCount': 6, 'sizeInBytes': 4507}
get_datasets(project_id: str, offset: int = 0, count: int = 10)
Returns a sublist of datasets contained in a project.
-
Parameters
-
project_id (str) – The id of the project. The user must have read permissions to the project.
-
offset (int) – Dataset list offset. The first element starts at offset 0.
-
count (int) – Limit on the number of datasets to analysis. No limit is applied if count is negative.
-
-
Returns
{ projectId: string, offset: number, totalCount: number, data: [{projectId: string, datasetId: string, name: string, description?: string, sampleCount: number, featureCount: number, sizeInBytes: number}]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
Constraints: Offset must be a non-negative integer.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_datasets('462')
{'projectId': '462', 'offset': 0, 'totalCount': 2, 'data': [
{'projectId': '462', 'datasetId': '6065', 'name': 'file',
'sampleCount': 150, 'featureCount': 6, 'sizeInBytes': 4507}
]}
delete_dataset(dataset_id: str)
Delete a specified dataset. Beware that this operation silently deletes all associated analyses and model uses.
-
Parameters
dataset_id (str) – Identifies the dataset. It must belong to a project to which the user has write permissions.
-
Returns
{datasetId: string, name: string, description?: string, sampleCount: number, featureCount: number, sizeInBytes: number}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.delete_dataset('6065')
{'datasetId': '6065', 'name': 'file', 'sampleCount': 150,
'featureCount': 6, 'sizeInBytes': 4507}
change_feature_types(dataset_id: str, new_name: str, changes: list)
Create an alternative versions of a specified dataset using different feature types.
-
Parameters
-
dataset_id (str) – Identifies the source dataset. It must belong to a project to which the user has read and write permissions. The new dataset will be attached to that same project.
-
new_name (str) – Used to name the new dataset. It must have at least 3 and at most 60 characters and must be unique within the target project.
-
changes (list) – list of { matcher: dict, newType: string }. Each element of the changes array matches some features of the source dataset as specified by the matcher field. The types of those features in the new dataset will be changed to the type given by newType whose value must be one of numerical, categorical, timeToEvent, event, or identifier. If some feature is matched by multiple matchers, the last matching entry in the changes array determines its new type.
’byName’ field provides the exact names of the features to match
{ matcher: { “byName”: [name1,…,nameN] }, newType: string }
’byIndex’ field provides the 0-based column indices of the features to match
{ matcher: { “byIndex”: [id1,…,idN] }, newType: string }
’byCurrentType’ field provides the current type of the features to match. It must be one of ‘numerical’, ‘categorical’, ‘timeToEvent’, ‘event’, or ‘identifier’.
{ matcher: { “byCurrentType”: string }, newType: string }
’byDeducedType’ field provides the “deduced” type of the features to match. It must be either categorical or identifier. Features have “deduced” types in cases where feature data did not provide sufficient clarity about the intended type, and so JADBio deduced a type on a best-effort basis.
{ matcher: { “byDeducedType”: string }, newType: string }
-
-
Returns
taskId
-
Return type
str
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> changes = [{'matcher': {'byName': ['variable1']},
... 'newType': 'categorical'}]
>>> client.change_feature_types('6065', 'file_cat', changes)
'691'
>>> import time
>>> time.sleep(3) # normally this would be done in a while loop
>>> client.get_task_status('691')
{'taskId': '691', 'state': 'finished', 'datasetIds': ['6067'],
'datasetId': '6067'}
change_feature_types_check(dataset_id: str, new_name: str, changes: list)
Check for possible warnings and/or errors in the creation of an alternative version of a specified dataset using different feature types.
-
Parameters
-
dataset_id (str) – Identifies the source dataset. It must belong to a project to which the user has read and write permissions. The new dataset will be attached to that same project.
-
new_name (str) – Used to name the new dataset. It must have at least 3 and at most 60 characters and must be unique within the target project.
-
changes (list) – list of { matcher: dict, newType: string }. Each element of the changes array matches some features of the source dataset as specified by the matcher field. The types of those features in the new dataset will be changed to the type given by newType whose value must be one of numerical, categorical, timeToEvent, event, or identifier. If some feature is matched by multiple matchers, the last matching entry in the changes array determines its new type.
’byName’ field provides the exact names of the features to match
{ matcher: { “byName”: [name1,…,nameN] }, newType: string }
’byIndex’ field provides the 0-based column indices of the features to match
{ matcher: { “byIndex”: [id1,…,idN] }, newType: string }
’byCurrentType’ field provides the current type of the features to match. It must be one of ‘numerical’, ‘categorical’, ‘timeToEvent’, ‘event’, or ‘identifier’.
{ matcher: { “byCurrentType”: string }, newType: string }
’byDeducedType’ field provides the “deduced” type of the features to match. It must be either categorical or identifier. Features have “deduced” types in cases where feature data did not provide sufficient clarity about the intended type, and so JADBio deduced a type on a best-effort basis.
{ matcher: { “byDeducedType”: string }, newType: string }
-
-
Returns
{errors?: [string], warnings?: [string], suggestions?: [string]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> changes = [{'matcher': {'byName': ['variable1']},
... 'newType': 'numerical'}]
>>> client.change_feature_types_check('6065', 'file_cat', changes)
{"warnings: ["IdentityTransformation"]}
get_task_status(task_id: str)
Returns the status of an asynchronous task running on the server.
-
Parameters
task_id (str) – The identity of the task.
-
Returns
{taskId: string, state: string, datasetId?: string, datasetIds?: [string]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_task_status('689')
{'taskId': '689', 'state': 'finished', 'datasetIds': ['6065'],
'datasetId': '6065'}
analyze_dataset(dataset_id: str, name: str, outcome: dict, model_selection_protocol: dict | None = None, thoroughness: str = 'preliminary', core_count: int = 1, grouping_feat: str | None = None, models_considered: str = 'all', feature_selection: str = 'mostRelevant', max_signature_size=None, max_visualized_signature_count=None)
Initiate an analysis of a specified dataset.
-
Parameters
-
dataset_id (str) – Identity of a dataset attached to a project to which the user has execute permissions.
-
name (str) – Provides the analysis with a human-readable name for future reference. The name can be at most 120 characters long.
-
outcome (dict) – dictionary. Specifies both the type of analysis intended, and the dataset feature or features that are to be predicted.
Regression analysis: outcome = {‘regression’: ‘target_variable_name’}
Classification analysis: outcome = {‘classification’: ‘target_variable_name’}
Survival analysis: outcome = {‘survival’: {
‘event’: ‘event_variable_name’, ‘timeToEvent’: ‘time_to_event_variable_name’ }}
- model_selection_protocol (dict) – dictionary. Specifies the model selection protocol (optional) Currently only cross validation is supported. Note: The parameters of cross-validation are automatically selected by JADBio.
Cross validation: model_selection_protocol = {
‘type’: ‘cv’
}
-
grouping_feat (str) – Specifies an Identifier feature that groups samples which must not be split across training and test datasets during analysis, e.g. because they are repeated measurements from the same patient (optional).
-
models_considered (str) – must be either ‘interpretable’ or ‘all’ This parameter controls the types of model considered during the search for the best one. Interpretable models include only models that are easy to interpret such as linear models and decision trees.
-
feature_selection (str) – must be either ‘mostRelevant’ or ‘mostRelevantOrAll’ (optional).
-
thoroughness (str) – must be one of ‘preliminary’, ‘typical’, or extensive. This parameter is used to reduce or expand the number of analysis configurations attempted in the search for the best ones; it significantly affects the running time of the analysis.
-
core_count (int) – Positive integer. It specifies the number of compute cores to use during the analysis, and must be at most the number of cores currently available to the user.
-
max_signature_size (int) – The maximum number of features used in a model found by the analysis. When present, it must be a positive integer. When not present, a default value of 25 is used.
-
max_visualized_signature_count (int) – The maximum number of signatures that will be prepared for visualization in the user interface. When present, it must be a positive integer. When not present, a default value of 5 is used.
-
-
Returns
analysis_id
-
Return type
str
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.analyze_dataset('6067', 'file_classification',
... {'classification': 'variable1'})
'5219'
analyze_dataset_custom_preprocessing(dataset_id: str, name: str, outcome: dict, thoroughness: str = 'preliminary', core_count: int = 1, grouping_feat: str | None = None, models_considered: str = 'all', feature_selection: str = 'mostRelevant', max_signature_size=None, max_visualized_signature_count=None, custom_preprocessing: List[Tuple[str, str]] = [])
Initiate an analysis of a specified dataset with custom preprocessing.
-
Parameters
-
dataset_id (str) – Identity of a dataset attached to a project to which the user has execute permissions.
-
name (str) – Provides the analysis with a human-readable name for future reference. The name can be at most 120 characters long.
-
outcome (dict) – dictionary. Specifies both the type of analysis intended, and the dataset feature or features that are to be predicted.
Regression analysis: outcome = {‘regression’: ‘target_variable_name’}
Classification analysis: outcome = {‘classification’: ‘target_variable_name’}
Survival analysis: outcome = {‘survival’: {
‘event’: ‘event_variable_name’, ‘timeToEvent’: ‘time_to_event_variable_name’ }}
-
grouping_feat (str) – Specifies an Identifier feature that groups samples which must not be split across training and test datasets during analysis, e.g. because they are repeated measurements from the same patient (optional).
-
models_considered (str) – must be either ‘interpretable’ or ‘all’ This parameter controls the types of model considered during the search for the best one. Interpretable models include only models that are easy to interpret such as linear models and decision trees.
-
feature_selection (str) – must be either ‘mostRelevant’ or ‘mostRelevantOrAll’ (optional).
-
thoroughness (str) – must be one of ‘preliminary’, ‘typical’, or extensive. This parameter is used to reduce or expand the number of analysis configurations attempted in the search for the best ones; it significantly affects the running time of the analysis.
-
core_count (int) – Positive integer. It specifies the number of compute cores to use during the analysis, and must be at most the number of cores currently available to the user.
-
max_signature_size (int) – The maximum number of features used in a model found by the analysis. When present, it must be a positive integer. When not present, a default value of 25 is used.
-
max_visualized_signature_count (int) – The maximum number of signatures that will be prepared for visualization in the user interface. When present, it must be a positive integer. When not present, a default value of 5 is used.
-
custom_preprocessing – list of tuple pairs that contain 1) The script type (‘R’ or ‘PYTHON’) 2) The code that will be executed as string
-
-
Returns
analysis_id
-
Return type
str
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.analyze_dataset('6067', 'file_classification',
... {'classification': 'variable1'})
'5219'
analyze_dataset_check(dataset_id: str, name: str, outcome: dict, thoroughness: str = 'preliminary', core_count: int = 1, grouping_feat: str | None = None, models_considered: str = 'all', feature_selection: str = 'mostRelevant', max_signature_size=None, max_visualized_signature_count=None)
Check for possible errors and warnings, if an analysis is run on a specified dataset.
-
Parameters
-
dataset_id (str) – Identity of a dataset attached to a project to which the user has execute permissions.
-
name (str) – Provides the analysis with a human-readable name for future reference. The name can be at most 120 characters long.
-
outcome (dict) – dictionary. Specifies both the type of analysis intended, and the dataset feature or features that are to be predicted.
Regression analysis: outcome = {‘regression’: ‘target_variable_name’}
Classification analysis: outcome = {‘classification’: ‘target_variable_name’}
Survival analysis: outcome = {‘survival’: {
‘event’: ‘event_variable_name’, ‘timeToEvent’: ‘time_to_event_variable_name’ }}
-
grouping_feat (str) – Specifies an Identifier feature that groups samples which must not be split across training and test datasets during analysis, e.g. because they are repeated measurements from the same patient (optional).
-
models_considered (str) – must be either ‘interpretable’ or ‘all’ This parameter controls the types of model considered during the search for the best one. Interpretable models include only models that are easy to interpret such as linear models and decision trees.
-
feature_selection (str) – must be either ‘mostRelevant’ or ‘mostRelevantOrAll’ (optional).
-
thoroughness (str) – must be one of ‘preliminary’, ‘typical’, or extensive. This parameter is used to reduce or expand the number of analysis configurations attempted in the search for the best ones; it significantly affects the running time of the analysis.
-
core_count (int) – Positive integer. It specifies the number of compute cores to use during the analysis, and must be at most the number of cores currently available to the user.
-
max_signature_size (int) – The maximum number of features used in a model found by the analysis. When present, it must be a positive integer. When not present, a default value of 25 is used.
-
max_visualized_signature_count (int) – The maximum number of signatures that will be prepared for visualization in the user interface. When present, it must be a positive integer. When not present, a default value of 5 is used.
-
-
Returns
{errors?: [string], warnings?: [string], suggestions?: [string]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.analyze_dataset_check('2310', 'file_classification',
... {'classification': 'target'})
{
"errors": ["SubscriptionDoesNotSupportExtensiveAnalysis",
"CoreCountLimitExceeded"],
"warnings: ["TooFewSamplesPerClassForAnalysis"]
}
analyze_dataset_extra_models(dataset_id: str, name: str, outcome: dict, extra_models: list | None = None, extra_feature_selectors: list | None = None, thoroughness: str = 'preliminary', core_count: int = 1, grouping_feat: str | None = None, models_considered: str = 'all', feature_selection: str = 'mostRelevant', max_signature_size=None, max_visualized_signature_count=None)
Initiate an analysis of a specified dataset, with additional models specified by the user. These models are added to be trained in the analysis on top of the models that JADBio selects using its AI system.
-
Parameters
-
dataset_id (str) – Identity of a dataset attached to a project to which the user has execute permissions.
-
name (str) – Provides the analysis with a human-readable name for future reference. The name can be at most 120 characters long.
-
outcome (dict) – dictionary. Specifies both the type of analysis intended, and the dataset feature or features that are to be predicted.
Regression analysis: outcome = {‘regression’: ‘target_variable_name’}
Classification analysis: outcome = {‘classification’: ‘target_variable_name’}
Survival analysis: outcome = {‘survival’: {
‘event’: ‘event_variable_name’, ‘timeToEvent’: ‘time_to_event_variable_name’ }}
- extra_models (list) – specifies extra models along with their hyperparameters, to be run in the current analysis. Input in the form:
{ 'name': 'an algorithm name', 'parameters': { 'paramName1': paramValue1, 'paramName2': paramValue2, ... } }[]
- extra_feature_selectors (list) – specifies extra featureSelectors along with their hyperparameters, to be run in the current analysis. Input in the form:
{ 'name': 'an algorithm name', 'parameters': { 'paramName1': paramValue1, 'paramName2': paramValue2, ... } }[]
-
grouping_feat (str) – Specifies an Identifier feature that groups samples which must not be split across training and test datasets during analysis, e.g. because they are repeated measurements from the same patient (optional).
-
models_considered (str) – must be either ‘interpretable’ or ‘all’ This parameter controls the types of model considered during the search for the best one. Interpretable models include only models that are easy to interpret such as linear models and decision trees.
-
feature_selection (str) – must be either ‘mostRelevant’ or ‘mostRelevantOrAll’ (optional).
-
thoroughness (str) – must be one of ‘preliminary’, ‘typical’, or extensive. This parameter is used to reduce or expand the number of analysis configurations attempted in the search for the best ones; it significantly affects the running time of the analysis.
-
core_count (int) – Positive integer. It specifies the number of compute cores to use during the analysis, and must be at most the number of cores currently available to the user.
-
max_signature_size (int) – The maximum number of features used in a model found by the analysis. When present, it must be a positive integer. When not present, a default value of 25 is used.
-
max_visualized_signature_count (int) – The maximum number of signatures that will be prepared for visualization in the user interface. When present, it must be a positive integer. When not present, a default value of 5 is used.
-
-
Returns
analysis_id
-
Return type
str
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> knn = [{'name': 'KNeighborsClassifier', 'parameters': {'n_neighbors': 5}}]
>>> client.analyze_dataset_extra_models('6067', 'file_classification',
... {'classification': 'variable1'}, extra_models=knn)
'5219'
analyze_dataset_extra_models_check(dataset_id: str, name: str, outcome: dict, extra_models: list | None = None, extra_feature_selectors: list | None = None, thoroughness: str = 'preliminary', core_count: int = 1, grouping_feat: str | None = None, models_considered: str = 'all', feature_selection: str = 'mostRelevant', max_signature_size=None, max_visualized_signature_count=None)
Check for possible errors and warnings, if an analysis with extra algorithms is run on a specified dataset.
-
Parameters
-
dataset_id (str) – Identity of a dataset attached to a project to which the user has execute permissions.
-
name (str) – Provides the analysis with a human-readable name for future reference. The name can be at most 120 characters long.
-
outcome (dict) – dictionary. Specifies both the type of analysis intended, and the dataset feature or features that are to be predicted.
Regression analysis: outcome = {‘regression’: ‘target_variable_name’}
Classification analysis: outcome = {‘classification’: ‘target_variable_name’}
Survival analysis: outcome = {‘survival’: {
‘event’: ‘event_variable_name’, ‘timeToEvent’: ‘time_to_event_variable_name’ }}
- extra_models (list) – specifies extra models along with their hyperparameters, to be run in the current analysis. Input in the form:
{ 'name': 'an algorithm name', 'parameters': { 'paramName1': paramValue1, 'paramName2': paramValue2, ... } }[]
- extra_feature_selectors (list) – specifies extra featureSelectors along with their hyperparameters, to be run in the current analysis. Input in the form:
{ 'name': 'an algorithm name', 'parameters': { 'paramName1': paramValue1, 'paramName2': paramValue2, ... } }[]
-
grouping_feat (str) – Specifies an Identifier feature that groups samples which must not be split across training and test datasets during analysis, e.g. because they are repeated measurements from the same patient (optional).
-
models_considered (str) – must be either ‘interpretable’ or ‘all’ This parameter controls the types of model considered during the search for the best one. Interpretable models include only models that are easy to interpret such as linear models and decision trees.
-
feature_selection (str) – must be either ‘mostRelevant’ or ‘mostRelevantOrAll’ (optional).
-
thoroughness (str) – must be one of ‘preliminary’, ‘typical’, or extensive. This parameter is used to reduce or expand the number of analysis configurations attempted in the search for the best ones; it significantly affects the running time of the analysis.
-
core_count (int) – Positive integer. It specifies the number of compute cores to use during the analysis, and must be at most the number of cores currently available to the user.
-
max_signature_size (int) – The maximum number of features used in a model found by the analysis. When present, it must be a positive integer. When not present, a default value of 25 is used.
-
max_visualized_signature_count (int) – The maximum number of signatures that will be prepared for visualization in the user interface. When present, it must be a positive integer. When not present, a default value of 5 is used.
-
-
Returns
{errors?: [string], warnings?: [string], suggestions?: [string]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> knn = [{'name': 'KNeighborsClassifier', 'parameters': {'n_neighbors': 5}}]
>>> client.analyze_dataset_extra_models_check('2310', 'file_classification',
... {'classification': 'target'})
{
"errors": ["SubscriptionDoesNotSupportExtensiveAnalysis",
"CoreCountLimitExceeded"],
"warnings: ["TooFewSamplesPerClassForAnalysis"]
}
get_extra_models_description(outcome_type: str)
Retrieves descriptions for extra available models that can be explicitly added to an analysis. These models can be added to be trained in the analysis on top of the models that JADBio selects using its AI system.
-
Parameters
outcome_type (str) – must be either ‘regression, ‘classification’, or ‘survival’. This parameter specifies the type of extra models to be retrieved.
-
Returns
{name: string, description: string, type: string, parameters: object[]}[]
-
Return type
list
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_extra_models_description('classification')
[
{'name': alg1Name,'description': alg1description,'type':alg1type,
'parameters':[{'name': param1,'description':param1descr,
'type': ['int'], 'defaultValue': 5,
'possibleValues':[{'min':minVal,'max':maxVal},...]},
...
]
},
...
]
get_extra_fs_description(outcome_type: str)
Retrieves descriptions for extra available feature selectors that can be explicitly added to an analysis. These feature selectors can be added to be trained in the analysis on top of the algorithms that JADBio selects using its AI system.
-
Parameters
outcome_type (str) – must be either ‘regression, ‘classification’, or ‘survival’. This parameter specifies the type of extra fs to be retrieved.
-
Returns
{name: string, description: string, type: string, parameters: object[]}[]
-
Return type
list
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_extra_models_description('classification')
[
{'name': alg1Name,'description': alg1description,'type':alg1type,
'parameters':[{'name': param1,'description':param1descr,
'type': ['int'], 'defaultValue': 5,
'possibleValues':[{'min':minVal,'max':maxVal},...]},
...
]
},
...
]
get_analysis(analysis_id: str)
Returns an analysis.
-
Parameters
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project.
-
Returns
{analysisId: string, projectId: string, parameters: object, state: string}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_analysis('5219')
{
'analysisId': '5219',
'projectId': '462',
'parameters': {
'coreCount': 1
'datasetId': '6067',
'featureSelection': 'mostRelevant',
'maxSignatureSize': 25,
'maxVisualizedSignatureCount': 5,
'modelsConsidered': 'all',
'name': 'file_classification',
'outcome': {'classification': 'variable1'},
'thoroughness': 'preliminary'},
'state': 'finished'}
}
get_analyses(project_id: str, offset: int = 0, count: int = 10)
Returns a sublist of analyses contained in a project.
-
Parameters
-
project_id (str) – The id of the project. The user must have read permissions to the project.
-
offset (int) – Analysis list offset. The first element starts at offset 0.
-
count (int) – Limit on the number of analyses to analysis. No limit is applied if count is negative.
-
-
Returns
{projectId: string, offset: number, totalCount: number, data: [{analysisId: string, parameters: object, state: string}]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
Constraints: Offset must be a non-negative integer.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_analyses('462')
{'projectId': '462', 'offset': 0, 'totalCount': 1, 'data': [{
'analysisId': '5219',
'projectId': '462',
'parameters': {
'coreCount': 1
'datasetId': '6067',
'featureSelection': 'mostRelevant',
'maxSignatureSize': 25,
'maxVisualizedSignatureCount': 5,
'modelsConsidered': 'all',
'name': 'file_classification',
'outcome': {'classification': 'variable1'},
'thoroughness': 'preliminary'},
'state': 'finished'}]}
get_analysis_status(analysis_id: str)
Returns the status of an analysis.
-
Parameters
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project.
-
Returns
{analysisId: string, parameters: object, state: string, startTime?: timestamp, executionTimeInSeconds?: number, progress?: number}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
The nested parameters object has the same fields as specified when the analysis was created, including the dataset identifier and optional values.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_analysis_status('5219')
{
'analysisId': '5219',
'projectId': '462',
'parameters': {
'coreCount': 1
'datasetId': '6067',
'featureSelection': 'mostRelevant',
'maxSignatureSize': 25,
'maxVisualizedSignatureCount': 5,
'modelsConsidered': 'all',
'name': 'file_classification',
'outcome': {'classification': 'variable1'},
'thoroughness': 'preliminary'},
'state': 'finished',
'startTime': '2021-02-26T11:04:15Z',
'executionTimeInSeconds': 10}
get_analysis_result(analysis_id: str)
Returns the result of a finished analysis.
-
Parameters
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project.
-
Returns
{analysisId: string, parameters: object, mlEngine: string, startTime: timestamp, executionTimeInSeconds: number, models: { model_key1?: model1, model_key2?: model2}}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
The nested parameters object has the same fields as specified when the analysis was created, including the identity of the dataset and optional values.
The model datatype has the following form: {preprocessing: string, featureSelection: string, model: string,
signatures: string[][], performance: {key: value,}}.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_analysis_result('5219')
{'mlEngine': 'jadbio-1.1.0', 'analysisId': '5219',
'parameters': {
'coreCount': 1
'datasetId': '6067',
'featureSelection': 'mostRelevant',
'maxSignatureSize': 25,
'maxVisualizedSignatureCount': 5,
'modelsConsidered': 'all',
'name': 'file_classification',
'outcome': {'classification': 'variable1'},
'thoroughness': 'preliminary'},
'models': {
'best': {
'preprocessing': 'Constant Removal, Standardization',
'featureSelection': 'Test-Budgeted Statistically Equivalent
Signature (SES) algorithm with hyper-parameters:
maxK = 2, alpha = 0.05 and budget = 3 * nvars',
'model': 'Support Vector Machines (SVM) of type C-SVC with
Polynomial Kernel and hyper-parameters: cost = 1.0,
gamma = 1.0, degree = 3',
'signatures': [["variable5", "variable4"]],
"performance": {
"Area Under the ROC Curve": 0.9979193891504624,
}},
'interpretable': {
'preprocessing': 'Constant Removal, Standardization',
'featureSelection': 'Test-Budgeted Statistically Equivalent
Signature (SES) algorithm with hyper-parameters:
maxK = 2, alpha = 0.05 and budget = 3 * nvars',
'model': 'Classification Decision Tree with Deviance
splitting criterion and hyper-parameters: minimum
leaf size = 3, and pruning parameter alpha = 0.05',
'signatures': [["variable5", "variable4"]],
"performance": {
"Area Under the ROC Curve": 0.951428730938,
}},
'modelView': {
'coeffLabels': ['Class 0 vs Class 1'],
'coefficients': [
[
-1.8866120080326776,
-2.4376029017925926,
1.4961428212277295,
]
],
'featureNames': ['Intercept','f1','f2']
}
},
'startTime': '2021-02-26T11:04:15Z', 'executionTimeInSeconds': 10}
download_analysis_model_predictions(analysis_id: str, model_id: str)
Returns the out-of-sample predictions for the specified model of a finished analysis.
-
Parameters
-
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project.
-
model_id (str) – Identifies the model.
-
-
Returns
{analysisId: string, parameters: object, mlEngine: string, startTime: timestamp, executionTimeInSeconds: number, models: { model_key1?: model1, model_key2?: model2}}
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.download_analysis_model_predictions('5219', 'best')
Sample name,Prob ( class = 0 ),Prob ( class = 1 ),Difficult to Predict,Label
sample1,0.03963326220876844,0.9603667377912317,false,1
delete_analysis(analysis_id: str)
Allows clients to delete a specified analysis.
-
Parameters
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project.
-
Returns
{analysisId: string, parameters: object, state: string}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
The parameters object has the same fields as specified when each analysis was created, including the dataset identifier and optional values.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.delete_analysis('5219')
{'analysisId': '5219', 'projectId': '462',
'parameters': {
'coreCount': 1
'datasetId': '6067',
'featureSelection': 'mostRelevant',
'maxSignatureSize': 25,
'maxVisualizedSignatureCount': 5,
'modelsConsidered': 'all',
'name': 'file_classification',
'outcome': {'classification': 'variable1'},
'thoroughness': 'preliminary'},
'state': 'finished'}
available_plots(analysis_id: str, model_key: str)
Retrieves the plot names of the computed plots for a given model in an analysis.
-
Parameters
-
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project.
-
model_key (str) – A key present in analysis_result[‘models’] (e.g. ‘best’ or ‘interpretable’)
-
-
Returns
{analysisId: string, modelKey: string, plots: string[]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
The analysisId, modelKey is the same as in the request. The plots array contains the plot names that are available for the current modelKey.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.available_plots('5219', "best")
{'analysisId': '5219', 'modelKey': 'best',
'plots': ['Feature Importance', 'Progressive Feature Importance']}
get_plot(analysis_id: str, model_key: str, plot_key: str)
Retrieves the raw values of a plot for a modelKey - analysis pair.
-
Parameters
-
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project.
-
model_key (str) – A key present in analysis_result[‘models’] (e.g. ‘best’ or ‘interpretable’)
-
plot_key (str) – A key present in available_plots[‘plots’] (e.g. ‘Feature Importance’)
-
-
Returns
{analysis_id: string, modelKey: string, plot: {plot_key: object}}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
The analysisId, model_key are the same as in the request. The plot object contains the raw values of the requested plot.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_plot('5219', "best", "Progressive Feature Importance")
{'analysisId': '5219', 'modelKey': 'best',
'plot': {
'Progressive Feature Importance': [
{
'name': ['variable5'],
'cis': [0.9826582435278086, 1.0],
'value': 0.9946595460614152},
{
'name': ['variable5', 'variable4'],
'cis': [1.0, 1.0],
'value': 1.0}]}
}
get_plots(analysis_id: str, model_key: str)
Retrieves the raw values of all the available plots for a modelKey - analysis pair.
-
Parameters
-
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project.
-
model_key (str) – A key present in analysis_result[‘models’] (e.g. ‘best’ or ‘interpretable’)
-
-
Returns
{analysis_id: string, model_key: string, plots: {plot_key: object}[]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
The analysisId, model_key are the same as in the request. The plots array contains the raw values of plot objects per plotKey present in the request.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_plots('5219', "best")
{'analysisId': '5548', 'modelKey': 'best',
'plots': [
{'Feature Importance': [{
'name': 'variable5',
'cis': [0.0, 0.017277777777777836],
'value': '0.0053404539385848585'},
{'name': 'variable4',
'cis': [0.0, 0.017341756472191352],
'value': '0.0053404539385848585'}]},
{'Progressive Feature Importance': [{
'name': ['variable5'],
'cis': [0.9826582435278086, 1.0],
'value': 0.9946595460614152},
{'name': ['variable5', 'variable4'],
'cis': [1.0, 1.0],
'value': 1.0}]}]}
predict_outcome(analysis_id: str, dataset_id: str, model_key: str, signature_index: int = 0)
Launches a task that predicts outcome for an unlabeled dataset using a model found by a finished analysis. (User must have read and execute permissions to the project that contains the analysis and the dataset to be predicted.)
-
Parameters
-
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project
-
dataset_id (str) – The id of the dataset to predict. This must belong to the same project as the analysis.
-
model_key (str) – A key present in analysis_result[‘models’] (e.g. ‘best’ or ‘interpretable’)
-
signature_index (int) – zero-based index of the model signature to use for the predictions.
-
-
Returns
predictionId
-
Return type
str
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.predict_outcome('5219', '6067', 'best')
'431'
predict_outcome_check(analysis_id: str, dataset_id: str, model_key: str, signature_index: int = 0)
Launches a task that predicts outcome for an unlabeled dataset using a model found by a finished analysis. (User must have read and execute permissions to the project that contains the analysis and the dataset to be predicted.)
-
Parameters
-
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project
-
dataset_id (str) – Identifies a dataset containing unlabeled data, must belong to the same project as analysis_id.
-
model_key (str) – A key present in analysis_result[‘models’] (e.g. ‘best’ or ‘interpretable’)
-
signature_index (int) – zero-based index of the model signature to use for the predictions.
-
-
Returns
{errors?: [string], warnings?: [string], suggestions?: [string]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
:Example:TestDataContainsSignatureFeatureCategoryNotInTrainingData
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.predict_outcome_check('5219', '6067', 'best')
{
"errors": ["TestDataContainsSignatureFeatureCategoryNotInTrainingData"]
}
get_prediction(prediction_id: str)
Returns a prediction.
-
Parameters
prediction_id (str) – Identifies the prediction which must belong to a project to which the user has read access.
-
Returns
{predictionId: string, projectId: string, parameters: object, state: string}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_prediction('431')
{
'projectId': '462',
'predictionId': '431',
'parameters': {'analysisId': '5219',
'modelKey': 'best',
'signatureIndex': 0,
'datasetId': '6067'
},
'state': 'finished'
}
get_predictions(analysis_id: str, offset: int = 0, count: int = 10)
Returns a sublist of the predictions from an analysis.
-
Parameters
-
analysis_id (str) – The id of the analysis. The user must have read permissions to the corresponding project.
-
offset (int) – Predictions list offset. The first element starts at offset 0.
-
count (int) – Limit on the number of predictions to return. No limit is applied if count is negative.
-
-
Returns
{analysisId: string, offset: number, totalCount: number, data: [{predictionId: string, projectId: string, parameters: object, state: string}]}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
Constraints: Offset must be a non-negative integer.
- Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_predictions('5219')
{
'analysisId': '5219'
'offset': 0,
'totalCount': 1,
'data': [{
'projectId': '462',
'predictionId': '431',
'parameters': {'analysisId': '5219',
'modelKey': 'best',
'signatureIndex': 0,
'datasetId': '6067'
},
'state': 'finished'
}],
}
get_prediction_status(prediction_id: str)
Returns the status of a prediction.
-
Parameters
prediction_id (str) – Identifies a prediction in a project to which the user has read permissions.
-
Returns
{predictionId: string, state: string, progress?: number}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_prediction_status('431')
{'predictionId': '431', 'state': 'finished'}
get_prediction_result(prediction_id: str)
Downloads the result of a finished prediction task.
-
Parameters
prediction_id (str) – Identifies a prediction in a project to which the user has read permissions.
-
Returns
predictions in csv format
-
Return type
str
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.get_prediction_result('431')
"Sample name,Prob ( class = firstcat ),Prob ( class = scndcat ),
Prob ( class = thirdCat )
1,0.9890362674058242,0.0046441918237983175,0.006319540770377525
2,0.9890362674058242,0.0046441918237983175,0.006319540770377525
3,0.9930412181618086,0.0031263130498427327,0.003832468788348629
...
...
..."
delete_prediction(prediction_id: str)
Allows clients to delete a specified prediction.
-
Parameters
prediction_id (str) – Identifies the prediction. It must belong to a project to which the user has write permissions.
-
Returns
{predictionId: string, projectId: string, parameters: object, state: string}
-
Return type
dict
-
Raises
RequestFailed, *JadRequestResponseError* – Exception in case sth goes wrong with a request.
-
Example
>>> client = JadbioClient('[email protected]', 'a password')
>>> client.delete_prediction('431')
{'projectId': '462', 'predictionId': '431',
'parameters': {'analysisId': '5219',
'modelKey': 'best',
'signatureIndex': 0,
'datasetId': '6067'
},
'state': 'finished'
}