DLEMod • Modules for DLE • DLE Parser PRO

DLE Parser PRO

DLE Parser PRO — a professional module for automating the parsing and publishing of content from external sources in DataLife Engine. Supports three modes: HTML parsing (CSS selectors/XPath), import from RSS/Atom, and hybrid mode. Automatically detects CMSs (WordPress, Joomla, Drupal, etc.), downloads and converts images to WebP, and performs AI rewriting via DeepSeek. The built-in Round-Robin scheduler evenly distributes materials among sources.

Buy now

Module version3.0.0

PHP version7.4 - 8.4

DLE version13.x - 19․1

Documentation

DLE Parser PRO is a comprehensive enterprise-level solution for website owners on DataLife Engine who need full automation of the process of filling a website with high-quality content. The module is a powerful system for extracting, processing, and publishing materials from external sources using advanced artificial intelligence technologies.

Module architecture: three parsing modes

HTML Parser — classic web scraping

Extracting content directly from the HTML structure of web pages
Support for complex pagination with customizable navigation patterns
Automatic detection of the site structure and CMS
Precise extraction via CSS selectors and XPath expressions
Processing dynamic content and AJAX loads
Support for bidirectional parsing (from newest to oldest / from oldest to newest)
Configuring page ranges with automatic progress tracking
Automatic downloading of files, images, videos, and galleries into DLE additional fields — via CSS selectors directly from the article HTML page
Support for all extraction types: href, src, data-src, data-href, content, text, html
Saving full HTML blocks (specification tables, formatted descriptions) into additional fields through the content cleaning filter

RSS/Atom Parser — working with news feeds

Native support for RSS 2.0, RSS 1.0 (RDF), and Atom 1.0 formats
Intelligent image extraction from multiple sources (enclosure, media:content, media:thumbnail, media:group)
Automatic processing of namespaces (media, content, dc, atom)
Extracting metadаta: author, publication date, categories
Support for full and short content (content:encoded, description)
Filtering and cleaning RSS content from advertising blocks
Priority retrieval of the main image via meta[property="og:image"] and meta[property="twitter:image"] directly from the article page; the RSS image is used as a fallback source

Hybrid Parser — the optimal combination of RSS and HTML

Using RSS to obtain a list of current materials
Parsing full content from the HTML version of the page
Priority data selection system (HTML takes precedence over RSS)
Merging metadata from both sources
Optimal processing speed with maximum extraction quality
Automatic detection of the most complete image source

Technological foundation and infrastructure

Intelligent CMS detection system

Automatic recognition of 18+ popular CMSs and frameworks
Supported platforms: WordPress, Joomla, Drupal, 1C-Bitrix, DLE, MODX, OpenCart
Blogging platforms: Ghost, Medium, Blogger, Tilda, Webflow
jаvascript frameworks: Next.js, Gatsby, Hugo, Jekyll
E-commerce: Shopify, WooCommerce, Magento
Analysis of HTTP headers and meta tags for accurate detection
Automatic suggestion of optimal CSS selectors for each CMS

AI rewriting via DeepSeek API

Integration with DeepSeek-V3 — an advanced language model with 671B parameters
Chunk-based processing: splitting long articles into optimal fragments
Preserving the HTML structure during rewriting (tags, formatting, lists)
Three-level processing: headings, short description, full text
Customizable prompts for each type of content
Automatic removal of AI artifacts (code blocks, explanations)
Rate limiting and API error handling with automatic retries
Cost efficiency: processing cost 20 times lower than GPT-4

Two-level protection bypass system:

Level 1: Enhanced cURL
- HTTP/2 support with full Chrome 131 emulation
- Sec-Fetch-* headers for bypassing basic filtering
- Cookie persistence between requests
- Automatic detection of Cloudflare challenges
Level 2: FlareSolverr Integration (optional)
- Full-fledged headless Chrome for bypassing jаvascript challenges
- Automatic Cloudflare captcha solving
- Support for Turnstile and other protection mechanisms
- Transparent switching when a block is detected
Intelligent detection of bypass necessity:
- Checking for \"Just a moment\", \"Checking your browser\"
- Detecting cf-browser-verification
- Automatic fallback to standard cURL when available
System requirements for Cloudflare bypass:
- Docker (for FlareSolverr)
- Minimum 1GB RAM
- VPS with container support

Professional image processing

Automatic image downloading with HTTPS and redirect support
Conversion to WebP to save 30-50% of disk space
Intelligent resize while preserving proportions (GD/Imagick)
Support for multiple formats: JPEG, PNG, GIF, WebP
Saving the main image in xfield with metadata
Replacing all images in content with local copies
Automatic generation of unique file names
File structure organization by date (YYYY-MM)

Round-Robin task scheduler

Even load distribution across all active sources
Automatic source rotation for balanced import
Progress tracking for each source individually
Configuring the number of posts per CRON run
Protecting the CRON endpoint with a Secret Key (32-character token)
Detailed logging of all parsing operations
Support for both old (engine/ajax/controller.php) and new (index.php?controller=ajax) DLE versions

Category management system

Intelligent category mapping

Automatic collection of categories from RSS feeds and HTML structure
Batch processing of articles to extract all unique categories
Visual interface for mapping source categories to DLE categories
Support for hierarchical DLE categories
Default category for unmapped materials
Multiple categories for a single item

Protection and reliability

Duplicate prevention system

Checking for the existence of a material by the source URL in xfields
Tracking the last processed position (page/URL)
Automatic skipping of already imported materials
Saving progress to the database for each source

Operational stability

Automatic reconnection to the database on timeouts
cURL error handling with detailed logging
Support for SSL certificates and bypassing blocks
User-Agent rotation to simulate browser requests
Timeout control for long operations

Cloudflare Bypass via FlareSolverr

Integration with FlareSolverr to bypass Cloudflare Bot Management
Automatic switching to a headless browser when protection is detected
Optional activation via settings (not required for all sources)
Graceful degradation: works with regular sites when FlareSolverr is disabled
Docker-based solution with automatic session management
Support for jаvascript challenges and cookie-based checks
Detailed logging of protection bypass attempts

Advanced features

Additional fields: downloading files, media, and galleries

Configure any number of additional fields for each source прямо from the add/edit form
For each field, set: the element CSS selector, the extraction attribute (href, src, data-src, data-href, content, text, html), and the action type
Supported action types: saving URL/text, file download, image download with metadata, video download, external video link (YouTube/Vimeo), gallery with bulk image download, gallery from a URL list
Gallery mode: automatic traversal of all found elements by selector, downloading each one and saving in DLE gallery format into a single field
Video files and downloadable files are saved in uploads/public_files/ with date-based organization (YYYY-MM)
Images from additional fields are saved in uploads/posts/ with automatic size detection and metadata generation in DLE format (width×height, file size)
Video fields are formatted in native DLE format: type 3 (local video) or type 1 (external link)
The extractExtraFieldsFromDom() method has been moved to the base BaseParser class (protected) — available for both HTML and Hybrid parsers without code duplication

Pagination and navigation setup

Support for standard patterns: /page/{page}/, ?page={page}, /p/{page}, /offset/{page}
Custom patterns for non-standard sites
Query parameters and complex URL schemes
Automatic construction of the next page URL
Configurable page range (start_page, end_page)
Specify the number of posts per page for accurate tracking

Flexible selector configuration

Support for CSS selectors of any complexity (classes, IDs, attributes, pseudo-classes)
XPath compatibility for complex structures
Exclusion selectors to remove ads and noise
Built-in tester with result preview
Selector validation before saving

Administrative panel

Intuitive interface for managing sources
Detailed statistics for each source (processed materials, progress, last run)
Quick enable/disable of sources
Reset progress for reprocessing
Editing sources while preserving progress
Built-in module update checking system
Logging all actions in admin_logs

Intelligent image preservation system during AI processing:

- Media element extraction before rewriting:
  - Automatic detection of <img>, <figure>, <picture>, <iframe>, <video>
  - Replacement with HTML comment placeholders
  - Preserving positions in the document structure
- Three-level restoration system:
  - Level 1: Direct matching by markers
  - Level 2: Intelligent insertion between paragraphs
  - Level 3: Appending to the end of the document in case of complete loss
- Final cleanup:
  - Removing accidentally saved markers from title/description
  - Normalizing HTML structure
  - Validating media elements

Multiple sources for extracting the main image:

- Open Graph and Twitter meta tags:
  - meta[property=\"og:image\"]
  - meta[name=\"twitter:image\"]
  - meta[name=\"twitter:image:src\"]
- Responsive images:
  - Support for the srcset attribute
  - Automatic selection of the highest resolution
  - Fallback to data-src and data-lazy-src
- Nested structures:
  - Extraction from <figure>, <picture> containers
  - Searching for img inside wrapper elements
  - Support for CSS background-image

Advantages of use

Time savings: full automation of the site content filling process — from parsing to publishing
Content uniqueness: AI rewriting ensures the originality of texts that pass plagiarism checks
SEO optimization: automatic generation of SEF URLs (alt_name), structured data
Low cost: using DeepSeek reduces AI costs by 20 times compared to GPT-4
Scalability: unlimited number of sources with Round-Robin balancing
Reliability: protection against duplicates, automatic connection recovery
Ease of setup: CMS auto-detection, built-in selector tester
Versatility: support for any sites with HTML structure, RSS feeds and hybrid schemes
Modularity: flexible architecture with the ability to disable unnecessary components
Performance: chunk-based processing, optimized SQL queries
Bypassing site protection: automatic bypass of Cloudflare and other anti-bot systems without proxy services
Configuration flexibility: ability to work with both protected and regular sources
Proxy savings: FlareSolverr — a free alternative to paid proxy services

Use Cases

News aggregators: automatic collection of news from several regional sources
Topical blogs: translation and adaptation of foreign content for a Russian-speaking audience
Review portals: import of reviews of technologies, gadgets, and software
Regional media: aggregation of local news followed by rewriting
Entertainment resources: automatic filling of sections with articles, guides, and top lists
Educational platforms: import of educational materials, articles, and manuals
Business portals: collection of industry news and analytics

Technical requirements and compatibility

DLE versions: 13.x, 14.x, 15.x, 16.x, 17.x, 18.x, 19.x, 19.1 (full compatibility)
PHP: 7.4+ (recommended 8.0+)
PHP extensions: CURL, DOM, XPath, libxml, GD or Imagick, JSON, mbstring
MySQL: 5.7+ or MariaDB 10.2+
Access rights: write access to /uploads/posts/, /engine/data/, /engine/cache/
External APIs: DeepSeek API (optional, for AI rewriting)
CRON: access to crontab task configuration

Screenshots

Choose a suitable plan

We offer flexible licensing options depending on your needs.

Standard

5000 ₽

Unlimited number of sites
Open source code
Basic
No further updates

Extended

6000 ₽

Unlimited number of sites
Open source code
priority
Free updates — (12 months)

Premium

11000 ₽

Unlimited number of sites
Open source code
Priority support + consultation
Free updates — forever
Module installation and setup
Adapted to your website (including reasonable code refinement for individual requirements)

История изменений

Релизов: 9

Функций: 21

Исправлений: 14

Улучшений: 13

Версия 3.0.0 27.04.2026

Новое

Добавлена полноценная интеграция с DLE Multi-Language: автоматическое сохранение переводов в title_{iso}, short_story_{iso}, full_story_{iso} и tags_{iso}.

Новое

Добавлен новый режим парсинга Sitemap с поддержкой больших sitemap-файлов, вложенных sitemap index и кеширования списка URL.

Новое

Добавлен реальный dry-run режим тестирования: проверка теперь выполняет симуляцию полного парсинга без записи в базу данных и показывает итоговый publish payload.

Новое

Добавлены структурированные логи парсинга со стадиями обработки, статусами, временем выполнения, source_id, item_url и информацией об ошибках.

Новое

Добавлен мониторинг состояния источников: health status, fail streak, duplicate rate, average fetch/run time и время последнего успешного запуска.

Улучшение

Полностью переработана логика HTML-парсинга списка материалов: теперь обрабатываются все найденные контейнеры, а не только первый matched node.

Улучшение

HTML progress переведен на URL/cursor модель вместо count-based прогресса, что снижает риск пропуска новых материалов.

Улучшение

Исправлена стратегия cursor для RSS, Hybrid и Sitemap в режиме new_to_old, чтобы новые материалы в верхней части источника не пропускались.

Улучшение

Улучшен Hybrid режим: добавлена обработка ошибок по материалам, advancement cursor при сбоях и защита от бесконечного застревания на одном item.

Улучшение

Добавлена поддержка HTML category selector в Hybrid режиме и политика объединения категорий RSS/HTML.

Улучшение

Усилен механизм поиска дублей: добавлена нормализация URL, GUID/external id, fingerprint заголовка и hash контента.

Улучшение

Улучшена нормализация URL перед проверкой дублей: учитываются trailing slash, fragment, tracking-параметры и различия в формате ссылок.

Улучшение

Усилен CSS selector engine: добавлена поддержка групп, комбинаторов, атрибутных селекторов и ряда pseudo-селекторов.

Улучшение

Добавлены предупреждения о поддерживаемом subset CSS-селекторов в help-разделе и test result.

Улучшение

Улучшена AI-обработка HTML: сохранение структуры тегов, защита media/code/pre блоков, повторная проверка неполных переводов и более стабильная работа с длинным контентом.

Улучшение

Улучшена генерация и перевод тегов, включая fallback-механизм, если AI не вернул корректный результат.

Исправление

Исправлено сохранение изображений при отключенном reformat: теперь сохраняется реальный исходный формат файла.

Исправление

Исправлены случаи, когда AI мог вернуть ссылки или HTML, не соответствующие настройкам очистки контента.

Исправление

Исправлена обработка figure/img блоков: изображения корректно извлекаются, очищаются и могут быть загружены на сервер.

Исправление

Исправлены случаи, когда code/pre блоки могли быть пропущены или удалены во время AI-обработки.

Исправление

Исправлены проблемы с незакрытыми ul/ol/li тегами в AI-переводах.

Исправление

Исправлена совместимость DB reconnect check с PHP 8 и mysqli.

Исправление

Найдены и исправлены другие мелкие ошибки.

Версия 2.1.4 11.03.2026

Исправление

Обнаружены и исправлены некоторые баги.

Версия 2.1.3 08.03.2026

Исправление

Обнаружены и исправлены некоторые баги.

Версия 2.1.2 28.02.2026

Исправление

Обнаружены и исправлены некоторые баги.

Версия 2.1.1 27.02.2026

Новое

Добавлена автоматическая генерация тегов через DeepSeek AI — анализирует заголовок и текст статьи и предлагает теги на русском языке

Новое

Новая настройка в разделе «Основные» — включение/отключение генерации тегов с указанием зависимости от AI Rewrite

Новое

Настраиваемый промпт для генерации тегов добавлен в раздел «AI Rewrite»

Версия 2.1.0 24.02.2026

Новое

Добавлена поддержка DLE 19.1

Новое

Теперь парсер умеет автоматически скачивать с сайта-донора файлы, видео, изображения и целые галереи — и сохранять всё это в дополнительные поля вашего сайта. Торренты, обложки, скриншоты, локальные видео — всё подтягивается само

Новое

Можно сохранять в дополнительное поле любой текстовый блок со страницы — например, таблицу характеристик или описание, прямо с форматированием

Новое

При парсинге RSS лента теперь старается взять обложку статьи с самой страницы (более качественную), а не из RSS-потока

Новое

В раздел «Помощь» добавлено подробное руководство по дополнительным полям — с примерами для каждого типа данных (файл, фото, видео, галерея, текст) и советами по частым ошибкам

Исправление

Найдены и исправлены некоторые мелкие баги.

Версия 2.0.0 15.02.2026

Новое

Добавлена поддержка Proxy (HTTP/SOCKS5) для обхода блокировок и смены IP

Новое

Интеграция FlareSolverr для автоматического обхода Cloudflare защиты

Новое

Гибкая настройка очистки контента из админ-панели (удаление scripts, styles, links, атрибутов)

Новое

FlareSolverr fallback для изображений при неудачной загрузке через cURL

Новое

Автоматическое определение и использование Proxy из настроек во всех AJAX endpoints

Новое

Проверка статуса FlareSolverr в режиме реального времени

Новое

Тестирование Proxy прямо из настроек с определением IP и геолокации

Улучшение

Улучшенная архитектура

Улучшение

Обновлен интерфейс настроек: 6 категорий (Основные, Cloudflare, Proxy, Очистка контента, AI Rewrite, Изображения)

Исправление

Исправлены проблемы с загрузкой страниц, защищенных Cloudflare

Исправление

Устранены конфликты имен функций между разными модулями

Версия 1.0.1 05.01.2026

Исправление

Обнаружены и исправлены некоторые баги.

Версия 1.0.0 03.01.2026

Новое

Первый релиз модуля

Comments 6

kosti kazancev Visitors

24 June 2026 13:04

0

так же интересует как настраевается шаблон если в доноре это перекрестный ссылки и с категориями у вас 1 селект но в доноре может быть несколько категорий + в разных местах типо жанр год итд , и какая категория к какому слову как присваевается тоже не понятно , и по доп полям как у категорий там может быть в ссылках текст итд или в других тегах

Complain
kosti kazancev Visitors

24 June 2026 11:05

0

здраствуйте нормальный прогеры делают админку обрезанную с плагином итд чтоб покупатели могли почупать плагин настроки итд потестит подходить ли он или нет , скрины это хорошо но тесты всегда лудше , но как понимаю у вас такой нет так ?

Complain
oldnick Visitors

15 March 2026 07:03

0

Прошу уточнить приблизительную стоимость рерайта ДипСиком за 1000 знаков. Возможна ли настройка на сокращение текста, например рерайт статьи 5000 знаков в статью 1000 знаков. Если уже есть рерайт модули, работающие на других ИИ, можно ли их включить в данный модуль и выбирать в настройках оптимальную нейронку под определенную тему. Например, по теме медицины пишет Джемини, по теме ИТ - Клод, по искусству - ДипСик и тд. Или же исходя из стоимости токенов - где выгоднее, на тот ИИ и переключиться.

Complain
Masterwen Visitors

10 March 2026 07:23

0

Здравствуйте. Я какая будет цена AI-рерайта через DeepSeek к примеру за одну новость. И как пополнят DeepSeek

Complain
Ghost Clients

9 February 2026 23:17

+3

Здравствуйте! Планирую купить DLE Parser PRO. Подскажите, смогу ли я использовать одну лицензию на 2-3 сайтах одновременно? Или она строго привязывается к одному домену?

Complain
1. admin Admin
  
  9 February 2026 23:19
  
  +1
  
  Здравствуйте. Да, можете. Модуль с открытым исходным кодом и не привязан к домену. Купив один раз, вы можете использовать его на неограниченном количестве сайтов.
  
  Complain

Products

Information

Поддержка

We are on social networks

Change theme

DLE Parser PRO

Module architecture: three parsing modes

Technological foundation and infrastructure

Category management system

Protection and reliability

Advanced features

Advantages of use

Use Cases

Technical requirements and compatibility

Screenshots

Choose a suitable plan

Standard

Extended

Premium

История изменений