problems with Rmarkdown word document format when rendered from taskscheduleR in Rstudio
I have problems with formatting of my Rmarkdown word documents when I render them as part of an automatic task configured in R studio using taskscheduleR. More precisely, page breaks disappear. Does anyone know how I can solve this problem.
do you know?
how many words do you know
See also questions close to this topic
-
How can I italicize part of a ggplot2 plot title when knitting the figure in an RMarkdown file?
If I'm just working in R to save a plot as a PNG I'm able to use the
{ggtext}
package to incorporate basic markdown into elements of my plots, but{ggtext}
outputs garbled text when I try usingelement_markdown()
in an R chunk.I've also tried:
my.title <- expression(paste0(italic("Species name"), " Rest of Title")) ggplot... + labs(title = my.title)
with no luck (when knitting).
-
How to fix pandoc (error 1) in rmarkdown for html output?
I'm using pandoc version 2.18 and R version 4.1.3. I'm using rmarkdown to knit a document into an html format, but repeatedly receive the following error:
pandoc.exe: \\: openBinaryFile: invalid argument (Invalid argument) Error: pandoc document conversion failed with error 1 Execution halted
I used the following code. HOWEVER, when I change the output type to word_document, it runs without any problem.
--- title: "Test_outputtype" output: html_document --- ```{r setup, include=FALSE}``` ### ggplot2 ```{r} library(ggplot2) ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth()```
I saw a couple of similar problems with pandoc error 1 that were resolved by changing the working directory to the c drive but I've already done that. This problem extends to other packages that rely on pandoc, such as flexdashboard.
Any ideas for solutions or workarounds would be greatly appreciated!
-
Rmarkdown: include plot height in resulting html <img> tag
I am working on a larger RMarkdown-site. Due to the size of images on the pages I implemented lazy loading for images using the
loading="lazy"
attribute of the<img>
tag by using theout.extra
chunk option for knitr.This worked nicely, but now I obviously have layout shifts e.g. when scrolling to the end of the document fast, since images are lazy-loading. The solution to this would be to specify the height and width attributes for the
img
tagIn my .html output, only the
width
is specified. Is there a way to make knitr also detect and include the height of the plot in the resulting<img>
tag`?PS: I know one can specify height and width in the chunk options, however I want to avoid doing this manually
-
Support with R software
I am an early career student researcher and carrying out a meta-analysis on the incidence and prevalence of ear disease (proportion meta-analysis). I have downloaded R and R studio. I was able to import the data. I need further help. Are there any videos on the basics of R in relation to proportion meta-analysis? OR any written text OR website OR anyone brave enough to show me live? any help or comment is highly appreciated.
-
RStudioServer - load/upgrade preferred set of packages to default sys path (/opt/R/4.2.0/lib/R/library) from R shell
Though an 'old guy' in unix world, I've little experience with RStudioServer and R, acting here as an 'admin' for a shared server for statisticians on a cancer research study. And my unix 'admin' experience is ... rusty !
I have a list of about 100 packages used by our senior Statistician on the study, and would prefer to load (AND UPDATE PERIODICALLY) them from an R shell directly (vs inside RStudioServer). I think using the default system location to add them ( /opt/R/4.2.0/lib/R/library) is the way to go. The RStudioServer setup, i think, still allows user to load their own (which is fine by me), but I thought having a default group loaded per our Statistician's ask... is wise.
The commands I was given which load these packages have the syntax below (only including a few lines of the 90 line .sh script) - they seem to install if the package doesnt already exist.
SO... I wonder if I set an environment (or ?) variable properly, then run R (as root ?), issue all these commands... would the libraries be placed appropriately in the library folder(for 4.2) that I set in the env variable ? (this example, it'd be /opt/R/4.2.0/lib/R/library ) ??
Since this is Linux, we would want sources when approprite, which i THINK is default loading for this os.
If there is a better/easier way to do this, I'm all ears !
Thanks in advance for any thoughts you have...
r
if (!requireNamespace("markdown", quietly = TRUE)) install.packages("markdown") if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") # graphical/table output if (!requireNamespace("igraph", quietly = TRUE)) install.packages("igraph") if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2") # advanced regression if (!requireNamespace("glmnet", quietly = TRUE)) install.packages("glmnet") if (!requireNamespace("sm", quietly = TRUE)) install.packages("sm")
environment: CentOS 8 RStudio Server 1.4.1717-3 R 4.2 (4.1.3, 3.6.3)
-
R Studio keeps crashing when I'm trying to merge multiple csv files into a data frame. How do I fix this?
I have 12 csv files that I need to merge for analysis project and their size ranges from 20mb to 120mb per file.
I attempted cutting down to only using the necessary columns by using fread() so it reads 6 columns instead of the total 11.
I've assigned each of them into a data frame as shown below.
However, at some point doing these manually, especially for using View() of the data frame that contains the 12 csv data, I keep getting crashed from R Studio probably due to the memory usage and the whole environment just resets and I have to do everything over again.
Is there a shorting and less ugly way to do this without crashing?
Packages <- c("dplyr", "janitor", "skimr", "readr", "lubridate","tidyverse","tidyr") lapply(Packages, library, character.only = TRUE) library("data.table") td2105 <- fread("/cloud/project/Capstone Cyclistic Project/202105-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2106 <- fread("/cloud/project/Capstone Cyclistic Project/202106-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2107 <- fread("/cloud/project/Capstone Cyclistic Project/202107-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2108 <- fread("/cloud/project/Capstone Cyclistic Project/202108-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2109 <- fread("/cloud/project/Capstone Cyclistic Project/202109-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2110 <- fread("/cloud/project/Capstone Cyclistic Project/202110-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2111 <- fread("/cloud/project/Capstone Cyclistic Project/202111-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2112 <- fread("/cloud/project/Capstone Cyclistic Project/202112-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2201 <- fread("/cloud/project/Capstone Cyclistic Project/202201-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2202 <- fread("/cloud/project/Capstone Cyclistic Project/202202-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2203 <- fread("/cloud/project/Capstone Cyclistic Project/202203-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2204 <- fread("/cloud/project/Capstone Cyclistic Project/202204-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td_2105_to_2204 <- rbind(td2105,td2106,td2107,td2108,td2109,td2110,td2111,td2112,td2201,td2202,td2203,td2204) View(td_2105_to_2204)
- Is quotes needed for parameter in Windows Task Scheduler?
-
Hazelcast cluster member crash results in loosing all scheduled tasks
We are running 4 instances of our java application in hazelcast cluster. We scheduled around 2000 task using schedule executor service schedule method. Hazelcast partition all these 2000 tasks across the 4 instances. Due to some reason one of the cluster member crashes then all the task that are assign to the partition that are owned by the crashed node are lost, rest all 3 cluster member completed their assign task.
So how can we overcome this problem to avoid the lost tasks.
-
Modules not recognized when running script windows task manager
I have a script named Sku_Matching_Salesforce.py. The location of the file is :
C:\Users\User\Desktop\Personal\DABRA\Sku_Matching_Salesforce.py
this is the VENV that is activated:
C:\Users\User\Desktop\Personal\DABRA\venv
If I run the script, it works fine, but when I run it via Windows Task Manager, it doesn't recognize Modules. I get this error message:
Traceback (most recent call last): File "C:\Users\User\Desktop\Personal\DABRA\Sku_Matching_Salesforce.py", line 5, in <module> import matplotlib.pyplot as plt ModuleNotFoundError: No module named 'matplotlib' Traceback (most recent call last): File "C:\Users\User\Desktop\Personal\DABRA\Sku_Matching_JFS-Beta1.py", line 5, in <module> import sqlalchemy ModuleNotFoundError: No module named 'sqlalchemy' Traceback (most recent call last): File "C:\Users\User\Desktop\Personal\DABRA\Sku_Matching_Solodeportes_Beta1.py", line 5, in <module> import sqlalchemy ModuleNotFoundError: No module named 'sqlalchemy' Traceback (most recent call last): File "C:\Users\User\Desktop\Personal\DABRA\Sku_Matching_OpenSports_Beta1.py", line 5, in <module> import sqlalchemy ModuleNotFoundError: No module named 'sqlalchemy' Traceback (most recent call last): File "C:\Users\User\Desktop\Personal\DABRA\Unificador_Salida_Final.py", line 6, in <module> import sqlalchemy ModuleNotFoundError: No module named 'sqlalchemy'
This is how I configured the task in windows manager in "Actions" label:
Program or Script: C:\Windows\py.exe Optional Arguments : Allscript.py (is the file that runs the script) Starts in : C:\Users\User\Desktop\Personal\DABRA\
I don´t know what the problem is...Could someone help me please?
Thanks in advance
-
Replicate data from database to another database
I am trying to replicate the data from one database to another. Everything else is working but for the code line 119 I am trying to transfer the exact data record but failing to get it replicated. I tried to get it but its multiplying the record with the number of ids and storing in the new database so e.g. if in old database the data is stored against id 17 then in the new database the record is getting multiplied by 17 and storing the record where I just want the original record to be replicated.
-
Powershell script searching files on domain
Very new to powershell and AD, so apologies if this post has an obvious answer. I have done some research and I am still not finding the answers I am looking for. My script is below for reference.
I have created a simple powershell script that will run on an admin vm i have setup on my domain. I have a separate SQL vm running a backup script that consume a lot of storage over time. I am trying to run this very simple script. My question is, do I need to modify this script in order to store it on my admin vm but have it run on my sql vm? Or can i leave the path as is and just set up in AD task scheduler. I have tried targeting the FQDN and the IP, but it doesn't seem to be working either way.
$backups_file = 'E:\blahBlahBla\SQL\Backups' or $backups_file = '<IP_ADDRESS>\E:\blahBlahBla\SQL\Backups' or $backups_file = '<FQDN>E:\blahBlahBla\SQL\Backups' $backup_file_exist = (Test-Path -Path $backups_file) if ($backup_file_exist){ # Verifies the folder exists Write-Output -InputObject "This folder exists" # returns all the files in the folder. Get-ChildItem -Path $backups_file # Deletes all files in the folder that are older that 7 days. Get-ChildItem -Path $backups_file -Recurse | Where-Object {($_.LastWriteTime -lt (Get- Date).AddDays(-7))} | Remove-Item } else { Write-Output -InputObject "Unable to access this directory." }
Thanks.
-
Why are these import errors occurring when running python scripts from cmd or windows task scheduler, but not anaconda?
I am encoutering import errors, but only when running my python scripts from cmd or windows task scheduler (effectively the same issue I assume). I have researched answers already and attempted various solutions (detailed below), but nothing has worked yet. I need to understand the problem in any case so that I can manage anything like it in the future.
Here is the issue:
Windows 10. Anaconda Python 3.9.7. Virtual enviromnent.
I have a script that works fine if I open an anaconda prompt, activate the virtual environment and run it.
However, this is where the fun starts. If I try to run the script from the non-anaconda cmd prompt deploying the commands: "C:\Users\user\anaconda3\envs\venv\python.exe" "C:\Users\user\scripts\script.py" if get the following error:
ImportError: DLL load failed while importing etree: The specified module could not be found. Traceback includes: "C:\Users\user\anaconda3\envs\venv\lib\site-packages\lxml\html\__init__.py", line 53, in <module> from ..import etree
This is not as simple as one specific module not being installed, because of course running the script from within the anaconda prompt and the virtual environment works. Similar also happens when I run other scripts. Other errors I have seen include, for example:
ImportError: DLL load failed while importing _imaging: The specified module could not be found. Traceback includes: "C:\Users\user\anaconda3\envs\venv\lib\site-packages\PIL\Image.py", line 114, in <module> from . import _imaging as core
Also, I think this may be somehow related. Importing numpy (1.22.3) from within the python interpreter in the virtual environment works fine, but when I try to run a test script that imports numpy it fails both from anaconda and the cmd with the following error:
ImportError: cannot import name SystemRandom
The oveall issue was noted originally when trying to run various scripts from Windows Task Scheduler with the path to python "C:\Users\user\anaconda3\envs\venv\python.exe" entered as the Program/script and the script "script.py" entered as an argument. The above errors were produced, then reproduced by running the scripts from a non-anaconda cmd.
I am looking to understand what is happening here and for a solution that can get the scripts running from the virtual enviroment from Windows Task Scheduler effectively.
Update:
I have uninstalled and reinstalled numpy (and pandas) using conda. This has left the venv with numpy==1.20.3 (and pandas=1.4.2). On attempting to re-run one of the scripts, it runs fine from within the venv in anaconda, but produces the following error when attempting to run from cmd or from within Windows Task Scheduler as above:
ImportError: Unable to import required dependencies: numpy: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions faled. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have complied some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.9 from "C:\Users\user\anaconda3\envs\venv\python.exe" * The NumPy version is "1.20.3" and make sure that they are the versions you expect. Please carefull study the documentation linked above for further help. Original error was: DLL load failed while importing _multiarray_umath: The specified module could not be found.
I have looked into the solutions suggested, but am still completely at a loss, especially as to why the script runs from the venv in one place, but NOT the other.
-
bat file in Windows scheduler not running python script
I am trying run a python script to update a ppt presentation. I have also tried this a year ago with running a regression and updating a table in SQL and didn't run either. I gave up then as I couldn't resolve it.
I have managed to create a bat file to run R code in windows scheduler and that works.
I have created the bat file and tested it in command prompt and it the py file runs and updates the ppt presentation.
When I run this bat file in windows scheduler is doesn't update the ppt.
Currently the bat file is as follows:
@echo off SET log_file=C:\python\logfile.txt echo on call :logit >>log_file=% exit /b 0 :logit call C:\ProgramData\Anaconda3\Scripts\activate.bat cd C:\python\ python Updateppt.py
These are the things I have tried so far:
- Added a log file to the bat file. The log file is created and adds the three steps so I know the bat file is run. The log file returns this:
C:\python>call C:\ProgramData\Anaconda3\Scripts\activate.bat (base) C:\python>cd C:\python\ (base) C:\python>python Updateppt.py
- Edited the bat file to various combinations based on recommendations from stack overflow. Most of them have worked in command prompt but none work in windows scheduler
- Check the security settings on the folder where I am saving the information and I have full access
- Made sure the folder is added to the PYTHONPATH in both system and user sections for environment variables
- Have an R file that currently runs via a bat file through windows scheduler so have made sure all the general, conditions and settings sections in the properties match that one
- re-run pip install on all packages to make sure they are accessible and in the right location when the py file runs. This was based on this advice: Cannot schedule a python script to run through Windows Task Scheduler
- Timed the command prompt and windows scheduler tasks and the command prompts takes 30 seconds whereas the windows scheduler takes 20 seconds
- Added logging into python file and it logs when script is started and it logs a time when in running in windows scheduler so it is running the python script
Is there anything I can do to get this working? I am really at a loss with this and I can't seem to find a stack overflow response that actual solves the issue I am having
UPDATE
I have added times after each function is run and right before the last function, the log file shows that when it is run in windows scheduler, it doesn't run the last function but instead loops back to the first one. It doesn't do this in command prompt
windows scheduler run log of python
INFO:root:run script started at 2022-04-29 13:18:31.318567 INFO:root:loaded enc data at 2022-04-29 13:18:32.072627 INFO:root:create enc_id at 2022-04-29 13:18:32.075627 INFO:root:agg data at 2022-04-29 13:18:59.782707 INFO:root:run script started at 2022-04-29 13:19:22.904437 INFO:root:loaded enc data at 2022-04-29 13:19:23.225462 INFO:root:create enc_id at 2022-04-29 13:19:23.228464
command prompt log of python
INFO:root:run script started at 2022-04-29 13:20:48.871881 INFO:root:loaded enc data at 2022-04-29 13:20:49.051893 INFO:root:create enc_id at 2022-04-29 13:20:49.054894 INFO:root:agg data at 2022-04-29 13:21:05.040096 INFO:root:run script stopped at 2022-04-29 13:21:05.436125
It should aggregate the data and then export to ppt and the script will stop and run the 'run script stopped' line. Why would it be running it correctly in command prompt but not windows scheduler?
This is the code it's not running
def update_ppt(CHW_daily): daily_figures = Presentation(ResultPath+'Template/daily_figures_template.pptx') # CHW table slide_CHW = daily_figures.slides[0] table_CHW = [shape for shape in slide_CHW.shapes if shape.has_table] #Then we can update the values in each cell directly from the dataframe: for i in range(1,8): for j in range(0,6): table_CHW[0].table.cell(i,j).text = str(CHW_daily.iloc[i-1, j]) table_CHW[0].table.cell(i,j).text_frame.paragraphs[0].font.size = Pt(14) daily_figures.save(ResultPath+'daily_figures.pptx') return()