Python
How to print a tree (Tweet)
def print_tree(tree, indent=2, level=0): for name, child in tree.items(): print(' '*indent*level + name) print_tree(child, indent, level+1)
How to get the args names from a function?
import inspect inspect.getfullargspec(<func>).args
In [1]: import inspect In [2]: def f(x,y): ...: pass ...: In [3]: inspect.getfullargspec(f).args Out[3]: ['x', 'y']
Why avoid initing Decimal from float (Tweet)
In [1]: from decimal import Decimal In [2]: Decimal(0.1) Out[2]: Decimal('0.1000000000000000055511151231257827021181583404541015625') In [3]: Decimal('0.1') Out[3]: Decimal('0.1')
Why avoid using mutable objects as default args (Tweet)
In [1]: def f(k, v, d={}): ...: d[k] = v ...: return d In [2]: f("x", 1) Out[2]: {'x': 1} In [3]: f("y", 2) Out[3]: {'x': 1, 'y': 2}
How to merge PDFs (Tweet)
from pypdf import PdfWriter w = PdfWriter() w.append("first.pdf") w.append("second.pdf") w.write("merged.pdf") w.close()
Updating list in-place (Tweet)
l2 = l1 = [1, 2] l2[:] = ['a', 'b'] print(l1 is l2, l1, l2) # True ['a', 'b'] ['a', 'b']
https://github.com/haralyzer/haralyzer/ - Lib to read HAR files #tools
[extras.pipfile_deprecated_finder.2] 'pip-shims<=0.3.4' does not match '^[a-zA-Z-_.0-9]+$
#troubleshooting -pre-commit autoupdate
How to move from pip to Poetry (Tweet)
poetry init cat requirements.txt | cut -d '=' -f 1 | xargs poetry add cat requirements-dev.txt | cut -d '=' -f 1 | xargs poetry add --group=dev rm requirements.txt requirements-dev.txt poetry install poetry run <command>
How to use global packages using Poetry?
pip install pipx pipx install <package>
Install requirements from git using ssh
pip install git+ssh://git@github.com/<org>/<repo>
Anti-Patterns Link to heading
Auth Link to heading
- Authlib - The ultimate library in building OAuth and OpenID Connect servers
Background tasks Link to heading
Relates to Message Queues
- Celery
- Dramatiq
- django_dramatiq - Django integration
- dramatiq-pg - PostgreSQL as Broker
- django-dramatiq-pg - Django integration with PostgreSQL as broker
- huey - a little task queue for python
- Procrastinate - PostgreSQL-based Task Queue for Python
- rq (Redis Queue) - library for queueing jobs and processing them in the background with workers.
Cache Link to heading
- https://github.com/grantjenks/python-diskcache
- https://github.com/uqfoundation/klepto - persistent caching to memory, disk, or database
CLI Link to heading
Data Link to heading
pandas Link to heading
- https://www.pola.rs/ - Lightning-fast DataFrame library for Rust and Python
- axios: 0=linha e 1=coluna
- pandas-profilling - PyPI, docs
- Geral
df.shape # (linhas, colunas) df.info() df.High.mean() # média da coluna High df.Date = pd.to_datetime(df.Date) # convert column to datetime
- Informações Estatísticas
df.describe() # informações estatísticas df.ride_duration.std() # desvio padrão da coluna ride_duration
- Visualização
df.High.plot() # gráfico da coluna High df.Volume.hist() # histograma da coluna Volume df.plot.scatter('c1', 'c2') # gráfico de dispersão df.Low.plot(kind='box') # gráfico boxplot
- Valores ausentes
df.isnull().sum() # conta o número de linhas com NaN df.isnull().sum() / df.shape[0] # % de valores ausentes df.dropna(subset=['user_gender'], axios=0) # apaga as linhas com valor NaNs da coluna user_gender
Dataclasses Link to heading
- attrs vs pydantic: Why I use attrs instead of pydantic
JSON Link to heading
- cysimdjson - SIMDJSON is C++ JSON parser, reportedly the fastest JSON parser on the planet.
- ijson - iterative JSON
- orjson - fast, supports NumPy
- rapidjson - RapidJSON is an extremely fast C++ JSON parser and serialization library
- ujson - written in C with Python bindings
ORM Link to heading
- PugSQL - simple interface for using parameterized SQL
Pipelines Link to heading
AI/Data Link to heading
- dagster: https://github.com/dagster-io/dagster
- cloud-native data pipeline orchestrator … integrated lineage and observability …, and best-in-class testability.
- designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports.
- cloud: https://dagster.io/
- vs. Airflow
- vs. Prefect
- Metaflow: https://github.com/Netflix/metaflow
- Prefect: https://github.com/PrefectHQ/prefect
- orchestrator for data-intensive workflows.
- build and observe resilient data workflows so that you can understand, react to, and recover from unexpected changes.
- vs. Dagster
- Pydra: https://github.com/nipype/pydra
- A simple dataflow engine with scalable semantics.
- Ray: https://github.com/ray-project/ray
- unified framework for scaling AI
- consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for simplifying ML compute
General Link to heading
- Airflow
- ⭐️ Joblib
- set of tools to provide lightweight pipelining.
- Main features: disk-caching; parallel helper; fast compressed persistence.
- How cache works? use hash to compare args
- Luigi
- helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
- Mara
- Principles: Data integration pipelines as code; PostgreSQL as a data processing engine; Extensive web ui; No in-app data processing; multiprocessing - single machine pipeline execution; nodes with higher cost are run first
- Mistral
- integrated with OpenStack
- define tasks and workflows in a simple YAML and a distributed environment
Ploomber - Docs Link to heading
- pygrametl
- provides commonly used functionality for the development of ETL processes.
- Pypeln
- for creating concurrent data pipelines
- Main Features: Simple; Easy-to-use; Flexible; Fine-grained Control.
- Queues: Process; Thread; Task.
- ⭐️ pypyr
- task runner for automation pipelines
- script sequential task workflow steps in yaml
- conditional execution, loops, error handling & retries
- SCOOP
- distributed task module allowing concurrent parallel programming on various environments, from heterogeneous grids to supercomputers.
- designed from the following ideas: the future is parallel; simple is beautiful; parallelism should be simpler.
- brokers: TCP and ZeroMQ
- SpiffWorkflow
- workflow engine implemented in pure Python.
- support the development of low-code business applications in Python. Using BPMN will allow non-developers to describe complex workflow processes in a visual diagram
- Built with: lxml; celery.
Profiling Link to heading
Profiler | What | Granularity | How |
---|---|---|---|
timeit | run time | snippet-level | |
cProfile | run time | method-level | deterministic |
statprof.py | run time | method-level | statictical |
line_profiler | run time | line-level | deterministic |
memory_profiler | memory | line-level | +- deterministic |
pympler | memory | method-level | deterministic |
Fonte: https://www.youtube.com/watch?v=DUCMjsrYSrQ |
PyCharm Link to heading
- How to enable relative line numbers?
PyPI mirror Link to heading
Retry Link to heading
Strings Link to heading
Formatting Link to heading
%
operator (Tweet)
Link to heading
%s
: String conversion.%d
or%i
: Integer conversion.%f
: Float conversion.%o
: Octal conversion.%x
or%X
: Hexadecimal conversion.%e
or%E
: Exponential notation conversion.
In [1]: "%s %d %f %o %x %e" % ("a", 1, 1.0, 8, 16, 100)
Out[1]: 'a 1 1.000000 10 10 1.000000e+02'
f-string Link to heading
Fonte: https://fstring.help/
- debugging (Tweet)
user = "eric_idle" f"{user=}" # "user='eric_idle'" f"{user = }" # "user = 'eric_idle'"
- padding (Tweet)
value = "test" f"{value:>10}" # ' test' f"{value:<10}" # 'test ' f"{value:_<10}" # 'test______' f"{value:^10}" # ' test '
Parsing Link to heading
- parse - Parse strings using a specification based on the Python format() syntax.
- https://github.com/jenisys/parse_type - extends with the following features: build type converters; compose type converters; CardinalityField naming schema
- ttp - Template Text Parser
Tests Link to heading
- How to print logs when running pytest? (Tweet)
pytest --log-cli-level DEBUG
Fixtures Link to heading
Why to fixtures instead of namespace variables for mocked data (Tweet)
# without fixture MOCK_DATA = [{"field": "value"}] def test_one(): MOCK_DATA[0]['field'] = 'other value' assert MOCK_DATA[0]['field'] == 'other value' def test_two(): assert MOCK_DATA[0]['field'] == 'value' # with fixture @pytest.fixture def mock_data(): return [{"field": "value"}] def test_three(mock_data): mock_data[0]['field'] = 'other value' assert MOCK_DATA[0]['field'] == 'other value' def test_four(mock_data): assert mock_data[0]['field'] == 'value'
def test_two(): > assert MOCK_DATA[0]['field'] == 'value' E AssertionError: assert 'other value' == 'value' E - value E + other value path/to/tests/test_zero.py:15: AssertionError =====================<mark> 1 failed, 3 passed in 0.18s </mark>=====================
Speccing Link to heading
- Why to use spec when using Mock? (Tweet)
from unittest.mock import Mock class MyClass: pass # without spec Mock().wrong_method() # Out: <Mock name='mock.wrong_method()' id='140607049530000'> # with spec Mock(spec=MyClass).wrong_method() # raises "AttributeError: Mock object has no attribute 'wrong_method'"
- Why to use autospec? (Tweet)
from unittest.mock import create_autospec, Mock class MyClass: myobj = object # without autospec Mock(spec=MyClass).myobj.wrong_method() # Out: <Mock name='mock.myobj.wrong_method()' id='140671042320272'> # with autospec create_autospec(MyClass).myobj.wrong_method() # raises "AttributeError: Mock object has no attribute 'wrong_method'"
Web Link to heading
GraphQL Server Link to heading
- Ariadne - https://ariadnegraphql.org/
- Graphene - https://graphene-python.org/
- Has not been maintained
- Strawberry - https://strawberry.rocks/
- Hard to understand the codebase
- Tartiflette - https://tartiflette.io/
- Needs to write the resolvers by hand. I didn’t find a good integration w/ ORMs
Keycloak Link to heading
- https://www.baeldung.com/postman-keycloak-endpoints
- https://github.com/marcospereirampj/python-keycloak/
- unnecessarily complex
- some strange evals: https://github.com/marcospereirampj/python-keycloak/blob/8fd315d11a42a8b4afebfe84498e882bc0b736c8/keycloak/authorization/init.py#L78-L91
requests Link to heading
RPC Link to heading
- gRPC
- RPyC - Docs - library for symmetrical remote procedure calls, clustering, and distributed-computing #OpenSource
SSH Link to heading
- Paramiko - Homepage; Docs; Source
- It doesn’t support SOCKS5 proxy (
ssh -D
) - issue; third-party PR - Port forward example
- sshtunnel - SSH tunnels to remote server
- It doesn’t support SOCKS5 proxy (