Python

Enable `isinstance` without inherance Link to heading

from abc import ABC, abstractmethod


def save(obj):
    pass


class MyClass:
    save = save


class SalvableModel(ABC):
    @abstractmethod
    def save(self, **overwrites):
        pass

    @classmethod
    def __subclasshook__(cls, subclass):
        return (hasattr(subclass, 'save') and callable(subclass.save))


isinstance(MyClass(), SalvableModel)
# True

Unpackaging a wheel Link to heading

from zipfile import ZipFile

with ZipFile('<path.whl>', 'r') as zf:
    zf.extractall('<path-to-extract>')

Printing a tree Link to heading

def print_tree(tree, indent=2, level=0):
    for name, child in tree.items():
        print(' '*indent*level + name)
        print_tree(child, indent, level+1)

Extracting args names from a function Link to heading

import inspect
inspect.getfullargspec(<func>).args

In [1]: import inspect

In [2]: def f(x,y):
    ...:     pass
    ...:

In [3]: inspect.getfullargspec(f).args
Out[3]: ['x', 'y']

Why avoid initing Decimal from float Link to heading

In [1]: from decimal import Decimal

In [2]: Decimal(0.1)
Out[2]: Decimal('0.1000000000000000055511151231257827021181583404541015625')

In [3]: Decimal('0.1')
Out[3]: Decimal('0.1')

Why avoid using mutable objects as default args Link to heading

In [1]: def f(k, v, d={}):
   ...:    d[k] = v
   ...:    return d

In [2]: f("x", 1)
Out[2]: {'x': 1}

In [3]: f("y", 2)
Out[3]: {'x': 1, 'y': 2}

Merging PDFs Link to heading

from pypdf import PdfWriter
w = PdfWriter()
w.append("first.pdf")
w.append("second.pdf")
w.write("merged.pdf")
w.close()

Updating list in-place Link to heading

l2 = l1 = [1, 2]
l2[:] = ['a', 'b']
print(l1 is l2, l1, l2)
# True ['a', 'b'] ['a', 'b']

Moving from pip to Poetry Link to heading

poetry init
cat requirements.txt | cut -d '=' -f 1 | xargs poetry add
cat requirements-dev.txt | cut -d '=' -f 1 | xargs poetry add --group=dev
rm requirements.txt requirements-dev.txt
poetry install
poetry run <command>

Installing requirements from git + ssh Link to heading

pip install git+ssh://git@github.com/<org>/<repo>

Spliting CLI args as string Link to heading

>>> "--system 'string w/ multiple words'".split()
['--system', "'string", 'w/', 'multiple', "words'"]

>>> import shlex
>>> shlex.split("--system 'string w/ multiple words'")
['--system', 'string w/ multiple words']

Sending arguments into a generator Link to heading

In [1]: def generator():
    ...:     while True:
    ...:         received = yield 'DATA'
    ...:         print('Received:', received)
    ...:

In [2]: g = generator()

In [3]: next(g)
Out[3]: 'DATA'

In [4]: g.send(1)
Received: 1
Out[4]: 'DATA'

In [5]: g.send(2)
Received: 2
Out[5]: 'DATA'

Printing logs when running pytest Link to heading

pytest --log-cli-level DEBUG

Why to fixtures instead of namespace variables for mocked data Link to heading

# without fixture

MOCK_DATA = [{"field": "value"}]


def test_one():
    MOCK_DATA[0]['field'] = 'other value'
    assert MOCK_DATA[0]['field'] == 'other value'


def test_two():
    assert MOCK_DATA[0]['field'] == 'value'


# with fixture

@pytest.fixture
def mock_data():
    return [{"field": "value"}]


def test_three(mock_data):
    mock_data[0]['field'] = 'other value'
    assert MOCK_DATA[0]['field'] == 'other value'


def test_four(mock_data):
    assert mock_data[0]['field'] == 'value'

    def test_two():
>       assert MOCK_DATA[0]['field'] == 'value'
E       AssertionError: assert 'other value' == 'value'
E         - value
E         + other value

path/to/tests/test_zero.py:15: AssertionError
=====================<mark> 1 failed, 3 passed in 0.18s </mark>=====================

Why to use spec when using mocks? Link to heading

from unittest.mock import Mock

class MyClass:
    pass

# without spec
Mock().wrong_method()
# Out: <Mock name='mock.wrong_method()' id='140607049530000'>

# with spec
Mock(spec=MyClass).wrong_method()
# raises "AttributeError: Mock object has no attribute 'wrong_method'"

Why to use autospec when using mocks? Link to heading

from unittest.mock import create_autospec, Mock

class MyClass:
    myobj = object

# without autospec
Mock(spec=MyClass).myobj.wrong_method()
# Out: <Mock name='mock.myobj.wrong_method()' id='140671042320272'>

# with autospec
create_autospec(MyClass).myobj.wrong_method()
# raises "AttributeError: Mock object has no attribute 'wrong_method'"

Anti-Patterns Link to heading

The Little Book of Python Anti-Patterns

Auth Link to heading

Authlib - The ultimate library in building OAuth and OpenID Connect servers

Data Link to heading

pandas Link to heading

https://www.pola.rs/ - Lightning-fast DataFrame library for Rust and Python
axios: 0=linha e 1=coluna
pandas-profilling - PyPI, docs

Geral

df.shape  # (linhas, colunas)
df.info()
df.High.mean()  # média da coluna High
df.Date = pd.to_datetime(df.Date)  # convert column to datetime

Informações Estatísticas

df.describe()  # informações estatísticas
df.ride_duration.std()  # desvio padrão da coluna ride_duration

Visualização

df.High.plot()  # gráfico da coluna High
df.Volume.hist()  # histograma da coluna Volume
df.plot.scatter('c1', 'c2')  # gráfico de dispersão
df.Low.plot(kind='box')  # gráfico boxplot

Valores ausentes

df.isnull().sum()  # conta o número de linhas com NaN
df.isnull().sum() / df.shape[0] # % de valores ausentes
df.dropna(subset=['user_gender'], axios=0)  # apaga as linhas com valor NaNs da coluna user_gender

Dataclasses Link to heading

attrs vs pydantic: Why I use attrs instead of pydantic
dry-python/classes: Smart, pythonic, ad-hoc, typed polymorphism for Python

Strings Link to heading

Formatting Link to heading

`%` operator (Tweet) Link to heading

%s: String conversion.
%d or %i: Integer conversion.
%f: Float conversion.
%o: Octal conversion.
%x or %X: Hexadecimal conversion.
%e or %E: Exponential notation conversion.

In [1]: "%s %d %f %o %x %e" % ("a", 1, 1.0, 8, 16, 100)
Out[1]: 'a 1 1.000000 10 10 1.000000e+02'

f-string Link to heading

Fonte: https://fstring.help/

debugging (Tweet)

user = "eric_idle"
f"{user=}"
# "user='eric_idle'"
f"{user = }"
# "user = 'eric_idle'"

padding (Tweet)

value = "test"
f"{value:>10}"
# '      test'
f"{value:<10}"
# 'test      '
f"{value:_<10}"
# 'test______'
f"{value:^10}"
# '   test   '

date

>>> from datetime import datetime
>>> d = datetime.now()
>>> f'{d:%Y-%m-%d}'
'2024-09-27'

Troubleshooting Link to heading

[extras.pipfile_deprecated_finder.2] 'pip-shims<=0.3.4' does not match '^[a-zA-Z-_.0-9]+$ #troubleshooting - pre-commit autoupdate
How to enable relative line numbers in PyCharm?
- https://intellij-support.jetbrains.com/hc/en-us/community/posts/360008429240-How-To-Enable-Relative-Line-Numbers-With-IdeaVim
- :set relativenumber
Error: pg_config executable not found. when installing psycopg2-binary
- sudo apt install libpq-dev on Debian/Ubuntu

Toolbox Link to heading

Background Tasks: Moved to My Toolbox - Python - Background Tasks
CLI: Moved to My Toolbox - Python - CLI
JSON: Moved to My Toolbox - Python - JSON
ORM: Moved to My Toolbox - Python - ORMs
RPC: Moved to My Toolbox - Python - RPC
Text Parsing: Moved to My Toolbox - Python - Text Parsing
GraphQL Server: Moved to My Toolbox - Django - GraphQL
WebAssembly: Moved to My Toolbox - Python - WebAssembly
https://github.com/haralyzer/haralyzer/ - Lib to read HAR files #tools

Cache Link to heading

https://github.com/grantjenks/python-diskcache
https://github.com/uqfoundation/klepto - persistent caching to memory, disk, or database

Keycloak Link to heading

https://www.baeldung.com/postman-keycloak-endpoints
https://github.com/marcospereirampj/python-keycloak/
- unnecessarily complex
- some strange evals: https://github.com/marcospereirampj/python-keycloak/blob/8fd315d11a42a8b4afebfe84498e882bc0b736c8/keycloak/authorization/init.py#L78-L91

Profiling Link to heading

Profiler	What	Granularity	How
timeit	run time	snippet-level
cProfile	run time	method-level	deterministic
statprof.py	run time	method-level	statictical
line_profiler	run time	line-level	deterministic
memory_profiler	memory	line-level	+- deterministic
pympler	memory	method-level	deterministic
Fonte: https://www.youtube.com/watch?v=DUCMjsrYSrQ

Promptflow Link to heading

#troubleshooting

promptflow._sdk._errors.MissingAzurePackage: "promptflow[azure]" is required for this functionality, please install it by running "pip install promptflow-azure" with your version.

Check if there is a connection.provider in the file ~/.promptflow/pf.yaml.

PyPI mirror Link to heading

requests Link to heading

https://github.com/requests/toolbelt/ #libs

Retry Link to heading

https://github.com/jd/tenacity

SSH Link to heading

Paramiko - Homepage; Docs; Source
- It doesn’t support SOCKS5 proxy (ssh -D) - issue; third-party PR
- Port forward example
- sshtunnel - SSH tunnels to remote server

Pipelines Link to heading

AI/Data Link to heading

dagster: https://github.com/dagster-io/dagster
- cloud-native data pipeline orchestrator … integrated lineage and observability …, and best-in-class testability.
- designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
- cloud: https://dagster.io/
- vs. Airflow
- vs. Prefect
Metaflow: https://github.com/Netflix/metaflow
- https://metaflow.org/
Prefect: https://github.com/PrefectHQ/prefect
- orchestrator for data-intensive workflows.
- build and observe resilient data workflows so that you can understand, react to, and recover from unexpected changes.
- vs. Dagster
Pydra: https://github.com/nipype/pydra
- A simple dataflow engine with scalable semantics.
Ray: https://github.com/ray-project/ray
- unified framework for scaling AI
- consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for simplifying ML compute

General Link to heading

Airflow
- vs. Dagster
⭐️ Joblib
- set of tools to provide lightweight pipelining.
- Main features: disk-caching; parallel helper; fast compressed persistence.
- How cache works? use hash to compare args
Luigi
- helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
Mara
- Principles: Data integration pipelines as code; PostgreSQL as a data processing engine; Extensive web ui; No in-app data processing; multiprocessing - single machine pipeline execution; nodes with higher cost are run first
Mistral
- integrated with OpenStack
- define tasks and workflows in a simple YAML and a distributed environment
Ploomber - Docs Link to heading
pygrametl
- provides commonly used functionality for the development of ETL processes.
Pypeln
- for creating concurrent data pipelines
- Main Features: Simple; Easy-to-use; Flexible; Fine-grained Control.
- Queues: Process; Thread; Task.
⭐️ pypyr
- task runner for automation pipelines
- script sequential task workflow steps in yaml
- conditional execution, loops, error handling & retries
SCOOP
- distributed task module allowing concurrent parallel programming on various environments, from heterogeneous grids to supercomputers.
- designed from the following ideas: the future is parallel; simple is beautiful; parallelism should be simpler.
- brokers: TCP and ZeroMQ
SpiffWorkflow
- workflow engine implemented in pure Python.
- support the development of low-code business applications in Python. Using BPMN will allow non-developers to describe complex workflow processes in a visual diagram
- Built with: lxml; celery.