Python
Printing a tree Link to heading
def print_tree(tree, indent=2, level=0):
for name, child in tree.items():
print(' '*indent*level + name)
print_tree(child, indent, level+1)
Extracting args names from a function Link to heading
import inspect
inspect.getfullargspec(<func>).args
In [1]: import inspect
In [2]: def f(x,y):
...: pass
...:
In [3]: inspect.getfullargspec(f).args
Out[3]: ['x', 'y']
Why avoid initing Decimal from float Link to heading
In [1]: from decimal import Decimal
In [2]: Decimal(0.1)
Out[2]: Decimal('0.1000000000000000055511151231257827021181583404541015625')
In [3]: Decimal('0.1')
Out[3]: Decimal('0.1')
Why avoid using mutable objects as default args Link to heading
In [1]: def f(k, v, d={}):
...: d[k] = v
...: return d
In [2]: f("x", 1)
Out[2]: {'x': 1}
In [3]: f("y", 2)
Out[3]: {'x': 1, 'y': 2}
Merging PDFs Link to heading
from pypdf import PdfWriter
w = PdfWriter()
w.append("first.pdf")
w.append("second.pdf")
w.write("merged.pdf")
w.close()
Updating list in-place Link to heading
l2 = l1 = [1, 2]
l2[:] = ['a', 'b']
print(l1 is l2, l1, l2)
# True ['a', 'b'] ['a', 'b']
Moving from pip to Poetry Link to heading
poetry init
cat requirements.txt | cut -d '=' -f 1 | xargs poetry add
cat requirements-dev.txt | cut -d '=' -f 1 | xargs poetry add --group=dev
rm requirements.txt requirements-dev.txt
poetry install
poetry run <command>
Installing requirements from git + ssh Link to heading
pip install git+ssh://git@github.com/<org>/<repo>
Spliting CLI args as string Link to heading
>>> "--system 'string w/ multiple words'".split()
['--system', "'string", 'w/', 'multiple', "words'"]
>>> import shlex
>>> shlex.split("--system 'string w/ multiple words'")
['--system', 'string w/ multiple words']
Sending arguments into a generator Link to heading
In [1]: def generator():
...: while True:
...: received = yield 'DATA'
...: print('Received:', received)
...:
In [2]: g = generator()
In [3]: next(g)
Out[3]: 'DATA'
In [4]: g.send(1)
Received: 1
Out[4]: 'DATA'
In [5]: g.send(2)
Received: 2
Out[5]: 'DATA'
Printing logs when running pytest Link to heading
pytest --log-cli-level DEBUG
Why to fixtures instead of namespace variables for mocked data Link to heading
# without fixture
MOCK_DATA = [{"field": "value"}]
def test_one():
MOCK_DATA[0]['field'] = 'other value'
assert MOCK_DATA[0]['field'] == 'other value'
def test_two():
assert MOCK_DATA[0]['field'] == 'value'
# with fixture
@pytest.fixture
def mock_data():
return [{"field": "value"}]
def test_three(mock_data):
mock_data[0]['field'] = 'other value'
assert MOCK_DATA[0]['field'] == 'other value'
def test_four(mock_data):
assert mock_data[0]['field'] == 'value'
def test_two():
> assert MOCK_DATA[0]['field'] == 'value'
E AssertionError: assert 'other value' == 'value'
E - value
E + other value
path/to/tests/test_zero.py:15: AssertionError
=====================<mark> 1 failed, 3 passed in 0.18s </mark>=====================
Why to use spec when using mocks? Link to heading
from unittest.mock import Mock
class MyClass:
pass
# without spec
Mock().wrong_method()
# Out: <Mock name='mock.wrong_method()' id='140607049530000'>
# with spec
Mock(spec=MyClass).wrong_method()
# raises "AttributeError: Mock object has no attribute 'wrong_method'"
Why to use autospec when using mocks? Link to heading
from unittest.mock import create_autospec, Mock
class MyClass:
myobj = object
# without autospec
Mock(spec=MyClass).myobj.wrong_method()
# Out: <Mock name='mock.myobj.wrong_method()' id='140671042320272'>
# with autospec
create_autospec(MyClass).myobj.wrong_method()
# raises "AttributeError: Mock object has no attribute 'wrong_method'"
Anti-Patterns Link to heading
Auth Link to heading
- Authlib - The ultimate library in building OAuth and OpenID Connect servers
Data Link to heading
pandas Link to heading
- https://www.pola.rs/ - Lightning-fast DataFrame library for Rust and Python
- axios: 0=linha e 1=coluna
- pandas-profilling - PyPI, docs
- Geral
df.shape # (linhas, colunas) df.info() df.High.mean() # média da coluna High df.Date = pd.to_datetime(df.Date) # convert column to datetime
- Informações Estatísticas
df.describe() # informações estatísticas df.ride_duration.std() # desvio padrão da coluna ride_duration
- Visualização
df.High.plot() # gráfico da coluna High df.Volume.hist() # histograma da coluna Volume df.plot.scatter('c1', 'c2') # gráfico de dispersão df.Low.plot(kind='box') # gráfico boxplot
- Valores ausentes
df.isnull().sum() # conta o número de linhas com NaN df.isnull().sum() / df.shape[0] # % de valores ausentes df.dropna(subset=['user_gender'], axios=0) # apaga as linhas com valor NaNs da coluna user_gender
Dataclasses Link to heading
- attrs vs pydantic: Why I use attrs instead of pydantic
Strings Link to heading
Formatting Link to heading
%
operator (Tweet)
Link to heading
%s
: String conversion.%d
or%i
: Integer conversion.%f
: Float conversion.%o
: Octal conversion.%x
or%X
: Hexadecimal conversion.%e
or%E
: Exponential notation conversion.
In [1]: "%s %d %f %o %x %e" % ("a", 1, 1.0, 8, 16, 100)
Out[1]: 'a 1 1.000000 10 10 1.000000e+02'
f-string Link to heading
Fonte: https://fstring.help/
- debugging (Tweet)
user = "eric_idle" f"{user=}" # "user='eric_idle'" f"{user = }" # "user = 'eric_idle'"
- padding (Tweet)
value = "test" f"{value:>10}" # ' test' f"{value:<10}" # 'test ' f"{value:_<10}" # 'test______' f"{value:^10}" # ' test '
- date
>>> from datetime import datetime >>> d = datetime.now() >>> f'{d:%Y-%m-%d}' '2024-09-27'
Troubleshooting Link to heading
[extras.pipfile_deprecated_finder.2] 'pip-shims<=0.3.4' does not match '^[a-zA-Z-_.0-9]+$
#troubleshooting -pre-commit autoupdate
- How to enable relative line numbers in PyCharm?
Toolbox Link to heading
- Background Tasks: Moved to My Toolbox - Python - Background Tasks
- CLI: Moved to My Toolbox - Python - CLI
- JSON: Moved to My Toolbox - Python - JSON
- ORM: Moved to My Toolbox - Python - ORMs
- RPC: Moved to My Toolbox - Python - RPC
- Text Parsing: Moved to My Toolbox - Python - Text Parsing
- GraphQL Server: Moved to My Toolbox - Django - GraphQL
- WebAssembly: Moved to My Toolbox - Python - WebAssembly
- https://github.com/haralyzer/haralyzer/ - Lib to read HAR files #tools
Cache Link to heading
- https://github.com/grantjenks/python-diskcache
- https://github.com/uqfoundation/klepto - persistent caching to memory, disk, or database
Keycloak Link to heading
- https://www.baeldung.com/postman-keycloak-endpoints
- https://github.com/marcospereirampj/python-keycloak/
- unnecessarily complex
- some strange evals: https://github.com/marcospereirampj/python-keycloak/blob/8fd315d11a42a8b4afebfe84498e882bc0b736c8/keycloak/authorization/init.py#L78-L91
Profiling Link to heading
Profiler | What | Granularity | How |
---|---|---|---|
timeit | run time | snippet-level | |
cProfile | run time | method-level | deterministic |
statprof.py | run time | method-level | statictical |
line_profiler | run time | line-level | deterministic |
memory_profiler | memory | line-level | +- deterministic |
pympler | memory | method-level | deterministic |
Fonte: https://www.youtube.com/watch?v=DUCMjsrYSrQ |
PyPI mirror Link to heading
requests Link to heading
Retry Link to heading
SSH Link to heading
- Paramiko - Homepage; Docs; Source
- It doesn’t support SOCKS5 proxy (
ssh -D
) - issue; third-party PR - Port forward example
- sshtunnel - SSH tunnels to remote server
- It doesn’t support SOCKS5 proxy (
Pipelines Link to heading
AI/Data Link to heading
- dagster: https://github.com/dagster-io/dagster
- cloud-native data pipeline orchestrator … integrated lineage and observability …, and best-in-class testability.
- designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
- cloud: https://dagster.io/
- vs. Airflow
- vs. Prefect
- Metaflow: https://github.com/Netflix/metaflow
- Prefect: https://github.com/PrefectHQ/prefect
- orchestrator for data-intensive workflows.
- build and observe resilient data workflows so that you can understand, react to, and recover from unexpected changes.
- vs. Dagster
- Pydra: https://github.com/nipype/pydra
- A simple dataflow engine with scalable semantics.
- Ray: https://github.com/ray-project/ray
- unified framework for scaling AI
- consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for simplifying ML compute
General Link to heading
- Airflow
- ⭐️ Joblib
- set of tools to provide lightweight pipelining.
- Main features: disk-caching; parallel helper; fast compressed persistence.
- How cache works? use hash to compare args
- Luigi
- helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
- Mara
- Principles: Data integration pipelines as code; PostgreSQL as a data processing engine; Extensive web ui; No in-app data processing; multiprocessing - single machine pipeline execution; nodes with higher cost are run first
- Mistral
- integrated with OpenStack
- define tasks and workflows in a simple YAML and a distributed environment
Ploomber - Docs Link to heading
- pygrametl
- provides commonly used functionality for the development of ETL processes.
- Pypeln
- for creating concurrent data pipelines
- Main Features: Simple; Easy-to-use; Flexible; Fine-grained Control.
- Queues: Process; Thread; Task.
- ⭐️ pypyr
- task runner for automation pipelines
- script sequential task workflow steps in yaml
- conditional execution, loops, error handling & retries
- SCOOP
- distributed task module allowing concurrent parallel programming on various environments, from heterogeneous grids to supercomputers.
- designed from the following ideas: the future is parallel; simple is beautiful; parallelism should be simpler.
- brokers: TCP and ZeroMQ
- SpiffWorkflow
- workflow engine implemented in pure Python.
- support the development of low-code business applications in Python. Using BPMN will allow non-developers to describe complex workflow processes in a visual diagram
- Built with: lxml; celery.