Purview | Custom process

How to create a custom Purview dataset and process with python
purview
python
jupyter
Author

Davide Fornelli

Published

December 14, 2021

Prerequisites

%pip install pyapacheatlas
Collecting pyapacheatlas
  Downloading pyapacheatlas-0.10.0-py3-none-any.whl (68 kB)
     |████████████████████████████████| 68 kB 1.9 MB/s             
Collecting openpyxl>=3.0
  Downloading openpyxl-3.0.9-py2.py3-none-any.whl (242 kB)
     |████████████████████████████████| 242 kB 3.8 MB/s            
Requirement already satisfied: requests>=2.0 in /home/daforne/repos/github/davidefornelli/blog/.venv/lib/python3.7/site-packages (from pyapacheatlas) (2.26.0)
Collecting et-xmlfile
  Using cached et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Requirement already satisfied: certifi>=2017.4.17 in /home/daforne/repos/github/davidefornelli/blog/.venv/lib/python3.7/site-packages (from requests>=2.0->pyapacheatlas) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/daforne/repos/github/davidefornelli/blog/.venv/lib/python3.7/site-packages (from requests>=2.0->pyapacheatlas) (2.0.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/daforne/repos/github/davidefornelli/blog/.venv/lib/python3.7/site-packages (from requests>=2.0->pyapacheatlas) (1.26.7)
Requirement already satisfied: idna<4,>=2.5 in /home/daforne/repos/github/davidefornelli/blog/.venv/lib/python3.7/site-packages (from requests>=2.0->pyapacheatlas) (3.3)
Installing collected packages: et-xmlfile, openpyxl, pyapacheatlas
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9 pyapacheatlas-0.10.0
Note: you may need to restart the kernel to use updated packages.

Code

Import

from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatlas.core.client import PurviewClient
from pyapacheatlas.core.entity import AtlasEntity
from pyapacheatlas.core.typedef import EntityTypeDef
from pyapacheatlas.core.util import GuidTracker
from pyapacheatlas.core.typedef import ChildEndDef
from pyapacheatlas.core.typedef import RelationshipTypeDef
from pyapacheatlas.core.typedef import ParentEndDef
from pyapacheatlas.core.typedef import Cardinality
from pyapacheatlas.core.typedef import AtlasAttributeDef
from pyapacheatlas.core import AtlasProcess

Settings

tenant_id = ""
client_id = ""
client_secret = ""
purview_name = ""

Clients

atlas_sp = ServicePrincipalAuthentication(
    tenant_id=tenant_id,
    client_id=client_id,
    client_secret=client_secret
)
purview_client = PurviewClient(
    account_name=purview_name,
    authentication=atlas_sp
)

Entities

guid_tracker = GuidTracker()
custom_dataset = purview_client.upload_typedefs(
    entityDefs=[
        EntityTypeDef(
            name="myCustomDataSet",
            superTypes=["DataSet"]
        )
    ],
    force_update=True
)
myCustomDataset01 = AtlasEntity(
    name="myCustomDataset01",
    typeName="myCustomDataSet",
    qualified_name="pyapacheatlas://mycustomdataset01",
    guid=guid_tracker.get_guid()
)
myCustomDataset02 = AtlasEntity(
    name="myCustomDataset02",
    typeName="myCustomDataSet",
    qualified_name="pyapacheatlas://mycustomdataset02",
    guid=guid_tracker.get_guid()
)
myCustomProcess01 = AtlasProcess(
    name="myCustomProcess01",
    typeName="Process",
    qualified_name="pyapacheatlas://mycustomprocess01",
    inputs=[myCustomDataset01],
    outputs=[myCustomDataset02],
    guid=guid_tracker.get_guid()
)
results = purview_client.upload_entities(
    batch=[myCustomDataset01, myCustomDataset02, myCustomProcess01]
)

Result

Call graph

Utils

Delete entities

purview_client.delete_typedefs(
    entityDefs=[
        {"name": "myCustomProcess"},
    ]
)