Python

Printing a tree Link to heading

Tweet

def print_tree(tree, indent=2, level=0):
    for name, child in tree.items():
        print(' '*indent*level + name)
        print_tree(child, indent, level+1)

Extracting args names from a function Link to heading

import inspect
inspect.getfullargspec(<func>).args
In [1]: import inspect

In [2]: def f(x,y):
    ...:     pass
    ...:

In [3]: inspect.getfullargspec(f).args
Out[3]: ['x', 'y']

Why avoid initing Decimal from float Link to heading

Tweet

In [1]: from decimal import Decimal

In [2]: Decimal(0.1)
Out[2]: Decimal('0.1000000000000000055511151231257827021181583404541015625')

In [3]: Decimal('0.1')
Out[3]: Decimal('0.1')

Why avoid using mutable objects as default args Link to heading

Tweet

In [1]: def f(k, v, d={}):
   ...:    d[k] = v
   ...:    return d

In [2]: f("x", 1)
Out[2]: {'x': 1}

In [3]: f("y", 2)
Out[3]: {'x': 1, 'y': 2}

Merging PDFs Link to heading

Tweet

from pypdf import PdfWriter
w = PdfWriter()
w.append("first.pdf")
w.append("second.pdf")
w.write("merged.pdf")
w.close()

Updating list in-place Link to heading

Tweet

l2 = l1 = [1, 2]
l2[:] = ['a', 'b']
print(l1 is l2, l1, l2)
# True ['a', 'b'] ['a', 'b']

Moving from pip to Poetry Link to heading

Tweet

poetry init
cat requirements.txt | cut -d '=' -f 1 | xargs poetry add
cat requirements-dev.txt | cut -d '=' -f 1 | xargs poetry add --group=dev
rm requirements.txt requirements-dev.txt
poetry install
poetry run <command>

Installing requirements from git + ssh Link to heading

pip install git+ssh://git@github.com/<org>/<repo>

Spliting CLI args as string Link to heading

Tweet

>>> "--system 'string w/ multiple words'".split()
['--system', "'string", 'w/', 'multiple', "words'"]

>>> import shlex
>>> shlex.split("--system 'string w/ multiple words'")
['--system', 'string w/ multiple words']

Sending arguments into a generator Link to heading

In [1]: def generator():
    ...:     while True:
    ...:         received = yield 'DATA'
    ...:         print('Received:', received)
    ...:

In [2]: g = generator()

In [3]: next(g)
Out[3]: 'DATA'

In [4]: g.send(1)
Received: 1
Out[4]: 'DATA'

In [5]: g.send(2)
Received: 2
Out[5]: 'DATA'

Printing logs when running pytest Link to heading

Tweet

pytest --log-cli-level DEBUG

Why to fixtures instead of namespace variables for mocked data Link to heading

Tweet

# without fixture

MOCK_DATA = [{"field": "value"}]


def test_one():
    MOCK_DATA[0]['field'] = 'other value'
    assert MOCK_DATA[0]['field'] == 'other value'


def test_two():
    assert MOCK_DATA[0]['field'] == 'value'


# with fixture

@pytest.fixture
def mock_data():
    return [{"field": "value"}]


def test_three(mock_data):
    mock_data[0]['field'] = 'other value'
    assert MOCK_DATA[0]['field'] == 'other value'


def test_four(mock_data):
    assert mock_data[0]['field'] == 'value'
    def test_two():
>       assert MOCK_DATA[0]['field'] == 'value'
E       AssertionError: assert 'other value' == 'value'
E         - value
E         + other value

path/to/tests/test_zero.py:15: AssertionError
=====================<mark> 1 failed, 3 passed in 0.18s </mark>=====================

Why to use spec when using mocks? Link to heading

Tweet

from unittest.mock import Mock

class MyClass:
    pass

# without spec
Mock().wrong_method()
# Out: <Mock name='mock.wrong_method()' id='140607049530000'>

# with spec
Mock(spec=MyClass).wrong_method()
# raises "AttributeError: Mock object has no attribute 'wrong_method'"

Why to use autospec when using mocks? Link to heading

Tweet

from unittest.mock import create_autospec, Mock

class MyClass:
    myobj = object

# without autospec
Mock(spec=MyClass).myobj.wrong_method()
# Out: <Mock name='mock.myobj.wrong_method()' id='140671042320272'>

# with autospec
create_autospec(MyClass).myobj.wrong_method()
# raises "AttributeError: Mock object has no attribute 'wrong_method'"

Anti-Patterns Link to heading

Auth Link to heading

  • Authlib - The ultimate library in building OAuth and OpenID Connect servers

Data Link to heading

pandas Link to heading

  • https://www.pola.rs/ - Lightning-fast DataFrame library for Rust and Python
  • axios: 0=linha e 1=coluna
  • pandas-profilling - PyPI, docs
  • Geral
    df.shape  # (linhas, colunas)
    df.info()
    df.High.mean()  # média da coluna High
    df.Date = pd.to_datetime(df.Date)  # convert column to datetime
    
  • Informações Estatísticas
    df.describe()  # informações estatísticas
    df.ride_duration.std()  # desvio padrão da coluna ride_duration
    
  • Visualização
    df.High.plot()  # gráfico da coluna High
    df.Volume.hist()  # histograma da coluna Volume
    df.plot.scatter('c1', 'c2')  # gráfico de dispersão
    df.Low.plot(kind='box')  # gráfico boxplot
    
  • Valores ausentes
    df.isnull().sum()  # conta o número de linhas com NaN
    df.isnull().sum() / df.shape[0] # % de valores ausentes
    df.dropna(subset=['user_gender'], axios=0)  # apaga as linhas com valor NaNs da coluna user_gender
    

Dataclasses Link to heading

Strings Link to heading

Formatting Link to heading

% operator (Tweet) Link to heading

  • %s: String conversion.
  • %d or %i: Integer conversion.
  • %f: Float conversion.
  • %o: Octal conversion.
  • %x or %X: Hexadecimal conversion.
  • %e or %E: Exponential notation conversion.
In [1]: "%s %d %f %o %x %e" % ("a", 1, 1.0, 8, 16, 100)
Out[1]: 'a 1 1.000000 10 10 1.000000e+02'

f-string Link to heading

Fonte: https://fstring.help/

  • debugging (Tweet)
    user = "eric_idle"
    f"{user=}"
    # "user='eric_idle'"
    f"{user = }"
    # "user = 'eric_idle'"
    
  • padding (Tweet)
    value = "test"
    f"{value:>10}"
    # '      test'
    f"{value:<10}"
    # 'test      '
    f"{value:_<10}"
    # 'test______'
    f"{value:^10}"
    # '   test   '
    
  • date
    >>> from datetime import datetime
    >>> d = datetime.now()
    >>> f'{d:%Y-%m-%d}'
    '2024-09-27'
    

Troubleshooting Link to heading

Toolbox Link to heading

Cache Link to heading

Keycloak Link to heading

Profiling Link to heading

ProfilerWhatGranularityHow
timeitrun timesnippet-level
cProfilerun timemethod-leveldeterministic
statprof.pyrun timemethod-levelstatictical
line_profilerrun timeline-leveldeterministic
memory_profilermemoryline-level+- deterministic
pymplermemorymethod-leveldeterministic
Fonte: https://www.youtube.com/watch?v=DUCMjsrYSrQ

PyPI mirror Link to heading

requests Link to heading

Retry Link to heading

SSH Link to heading

Pipelines Link to heading

AI/Data Link to heading

General Link to heading

  • Airflow GitHub Repo stars
  • ⭐️ Joblib GitHub Repo stars
    • set of tools to provide lightweight pipelining.
    • Main features: disk-caching; parallel helper; fast compressed persistence.
    • How cache works? use hash to compare args
  • Luigi GitHub Repo stars
    • helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
  • Mara GitHub Repo stars
    • Principles: Data integration pipelines as code; PostgreSQL as a data processing engine; Extensive web ui; No in-app data processing; multiprocessing - single machine pipeline execution; nodes with higher cost are run first
  • Mistral GitHub Repo stars
    • integrated with OpenStack
    • define tasks and workflows in a simple YAML and a distributed environment
  • Ploomber GitHub Repo stars - Docs Link to heading

  • pygrametl GitHub Repo stars
    • provides commonly used functionality for the development of ETL processes.
  • Pypeln GitHub Repo stars
    • for creating concurrent data pipelines
    • Main Features: Simple; Easy-to-use; Flexible; Fine-grained Control.
    • Queues: Process; Thread; Task.
  • ⭐️ pypyr GitHub Repo stars
    • task runner for automation pipelines
    • script sequential task workflow steps in yaml
    • conditional execution, loops, error handling & retries
  • SCOOP GitHub Repo stars
    • distributed task module allowing concurrent parallel programming on various environments, from heterogeneous grids to supercomputers.
    • designed from the following ideas: the future is parallel; simple is beautiful; parallelism should be simpler.
    • brokers: TCP and ZeroMQ
  • SpiffWorkflow GitHub Repo stars
    • workflow engine implemented in pure Python.
    • support the development of low-code business applications in Python. Using BPMN will allow non-developers to describe complex workflow processes in a visual diagram
    • Built with: lxml; celery.